Eric Jang – Building AlphaGo from scratch | Dwarkesh Podcast

Eric Jang – Building AlphaGo from scratch

2 hours ago•Dwarkesh Podcast•Dwarkesh Patel

Podcast2 hr 37 min

Listen to Episode

Note: AI-generated summary based on third-party content. Not financial advice. Read more.

Quick Insights

Investors should maintain long-term exposure to Alphabet (GOOGL) as they pivot from consumer AI to solving "intractable" high-value problems in biology and physics via AlphaFold and AlphaTensor. NVIDIA (NVDA) remains a high-conviction play due to "inference scaling," a trend where AI models require massive sustained compute power to "think" during the reasoning phase, not just during initial training. Look for opportunities in Automated AI Research software that provides verification for autonomous scientific discoveries, as AI begins to replace junior research engineers in hyperparameter optimization. The rapid commoditization of AI training—where frontier capabilities that once cost millions now cost under $10,000—suggests a shift in value toward companies with proprietary data quality rather than just raw compute. In the robotics sector, watch for firms utilizing Foundation Models and "Dagger" algorithms to amortize complex physical movements into efficient, real-time neural network passes.

Detailed Analysis

The following investment insights are extracted from the discussion between Eric Jang (VP of AI at 1x Technologies, formerly Google DeepMind) and Dwarkesh Patel regarding the mechanics of AlphaGo, the evolution of AI research, and the future of automated discovery.

Google DeepMind (GOOGL)

The discussion highlights the historical and ongoing significance of DeepMind’s research. While AlphaGo was a landmark achievement, the conversation suggests that the "moat" of such research is shifting from raw compute to algorithmic efficiency and data quality.

Takeaways

Research Legacy as a Foundation: DeepMind’s work on games (Go, StarCraft) has provided a "tech tree" of capabilities (like TPU scaling and reinforcement learning) that are now being applied to Large Language Models (LLMs) and physical sciences.
AlphaFold and Scientific AI: The same principles used in AlphaGo (compressing complex search problems into neural network forward passes) are being applied to AlphaFold (protein folding) and AlphaTensor. This suggests Google's long-term value lies in its ability to solve "intractable" physical and biological problems, not just consumer AI.
Efficiency Gains: What cost DeepMind millions of dollars and massive TPU pods in 2016 can now be replicated for roughly $3,000 - $10,000 in rented compute, indicating a rapid commoditization of once-frontier AI capabilities.

NVIDIA (NVDA)

The "Bitter Lesson" of AI—that general methods leveraging compute eventually outperform human-designed "tricks"—is a central theme. The transcript emphasizes how hardware improvements have simplified AI development.

Takeaways

Hardware-Driven Simplification: Modern GPUs (like the H100 or Blackwell architectures) allow researchers to bypass complex, "clever" algorithmic tricks used in 2017-2020. Increased compute power allows for simpler, more robust training recipes.
Inference Scaling: The discussion touches on "test-time scaling," where spending more compute during the "thinking" phase (inference) can compensate for less training. This suggests a sustained, long-term demand for high-end chips not just for training, but for sophisticated reasoning agents.

1x Technologies (Private)

As Eric Jang was most recently the VP of AI at 1x, the conversation bridges the gap between game AI and robotics.

Takeaways

Robotics and "Dagger" Algorithms: The discussion links AlphaGo’s training to "Dagger" (Dataset Aggregation) in robotics. This involves an agent learning to correct itself when it drifts off an optimal path.
Foundation Models for Physical Tasks: The insight that a 10-layer network can "amortize" deep search suggests that humanoid robots may not need to "calculate" every movement in real-time if they are trained on enough simulated "search" data.

Investment Themes: Automated AI Research

A major portion of the transcript focuses on the transition from humans writing code to AIs (like Claude 3.5/3.7) conducting autonomous experiments.

Takeaways

The "Automated Scientist": AI models are now capable of "hyperparameter optimization"—essentially grinding performance metrics to squeeze out efficiency. This reduces the need for large teams of junior research engineers.
Lateral Thinking Gap: Current AI models excel at executing experiments but struggle with "lateral thinking" (knowing when a research track is a dead end). Investment opportunities may lie in companies building the "outer loop" of verification (e.g., software that can autonomously verify if an AI's scientific discovery is valid).
Niche vs. General Models: While Transformers dominate LLMs, ResNets (Convolutional Neural Networks) are still more "compute-optimal" for specific spatial tasks like Go or certain robotics applications. This suggests a future of heterogeneous AI architectures rather than a "one-size-fits-all" model.

Key Technical Terms for Investors

MCTS (Monte Carlo Tree Search): The "thinking" process. It allows an AI to look ahead. In the market, this is becoming known as "Inference-time compute" (similar to OpenAI’s o1 model).
Tabula Rasa Learning: Training an AI from scratch without human data. This is the "holy grail" for sectors where human data is scarce (e.g., specialized manufacturing or rare disease research).
Compute-Optimal Pareto Frontier: The balance between model size, data, and compute. Companies that can reach this frontier faster than others have a significant margin advantage.

Ask about this postAnswers are grounded in this post's content.

Episode Description

Eric Jang walks through how to build AlphaGo from scratch, but with modern AI tools. Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight on how the more general AIs of the future might learn. Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better - naive policy gradient RL has to figure out which move in your 100s of k token trajectory actually caused resulted in you getting the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second. Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside. Watch on YouTube. Read the transcript. Sponsors * Cursor‘s agent SDK let me build a pipeline to generate flashcards for this episode. For each card, I had an agent read the transcript, ingest blackboard screenshots, generate an SVG visual, and run everything through a critic. A durable agent is much better at this kind of work than a chain of LLM calls, and Cursor’s SDK made it easy. Check out the cards at flashcards.dwarkesh.com and get started with the SDK at cursor.com/dwarkesh * Jane Street gave me a real deep-dive tour of one of their datacenters. I got to ask a bunch of questions to Ron Minsky, who co-leads Jane Street’s tech group, and Dan Pontecorvo, who runs Jane Street’s physical engineering team. They were willing to literally pull up the floorboards and take out racks to explain how everything works. Check out the full tour at janestreet.com/dwarkesh Timestamps (00:00:00) – Basics of Go (00:08:17) – Monte Carlo Tree Search (00:32:04) – What the neural network does (01:00:33) – Self-play (01:25:38) – Alternative RL approaches (01:45:47) – Why doesn't MCTS work for LLMs (02:01:09) – Off-policy training (02:12:02) – RL is even more information inefficient than you thought (02:22:16) – Automated AI researchers Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

About Dwarkesh Podcast

Dwarkesh Podcast

By Dwarkesh Patel

Deeply researched interviews <br/><br/><a href="https://www.dwarkesh.com?utm_medium=podcast">www.dwarkesh.com</a>