The next big breakthrough will be AIs learning on the job | Dwarkesh Podcast

The next big breakthrough will be AIs learning on the job

2 hours ago•Dwarkesh Podcast•Dwarkesh Patel

Podcast19 min 38 sec

Listen to Episode

Note: AI-generated summary based on third-party content. Not financial advice. Read more.

Quick Insights

Investors should prioritize Inference Compute providers as AI shifts toward "Test-Time Training," a process where models run internal simulations that will likely drive demand beyond initial training levels. High-conviction opportunities lie in companies owning proprietary, high-fidelity data environments like Adobe (ADBE), Salesforce (CRM), and specialized CAD software providers, which act as the "simulators" necessary for AI to learn professional skills. Look for startups focusing on Sample Efficiency and Weight Update Efficiency, as these technologies will drastically reduce the cost of training "agentic" AI by 2027. For immediate operational efficiency in the SME sector, Mercury remains a leader in AI-integrated fintech by automating complex accounts payable and banking workflows. Avoid companies reliant on scraping public data from platforms like Amazon (AMZN), and instead favor those building "digital twins" or clones of the internet for private model training.

Detailed Analysis

Artificial Intelligence & LLM Infrastructure

The discussion centers on the transition from static AI models to "on-the-job" learners. The core thesis is that current AI progress is bottlenecked by sample inefficiency (needing massive amounts of data to learn) and a lack of continual learning (the ability to update internal "weights" based on real-world experience).

RLVR (Reinforcement Learning from Verifiable Rewards): AI labs are betting billions on training agents in simulated environments (coding, math) where success is easily verified.
The "Computer Use" Bottleneck: Progress in AI using computers (booking flights, filing taxes) is slower than coding because it isn't "grindable." You cannot run 1,000 parallel bots on Amazon or Google without being blocked; therefore, the industry needs high-fidelity "clones" of the internet to train models.
Context Window vs. Weights: Current models "learn" within a session (context window), but this is ephemeral. For AI to become truly productive (like a long-term employee), that knowledge must be distilled back into the model's permanent memory (weights).
OPSD (On-Policy Self-Distillation): A emerging technique to bridge the gap between temporary session learning and permanent model improvement without needing a "reward" signal for every step.

Takeaways

Investment Theme: Look for companies building "Farmable Deterministic Simulators." The next leg of AI growth depends on creating digital twins of the real world (Slack clones, Gmail clones, market simulators) where AI can "practice" without real-world consequences.
Compute Demand: The shift toward "Dreaming" or "Test-Time Training" (where models run simulations in their head before acting) suggests that Inference Compute demand will skyrocket, potentially rivaling or exceeding initial training compute.
Sector Opportunity: Companies that own proprietary, high-fidelity data environments (e.g., Adobe, Salesforce, or specialized CAD software) are well-positioned because they provide the "simulators" AI needs to learn specific professional skills.

Mercury (Fintech)

The transcript highlights Mercury as a specific example of how AI-integrated platforms are currently handling operational overhead for businesses.

Automated Accounts Payable: The platform handles the full lifecycle of invoicing: receiving emails, scanning data (name, amount, due date), and creating draft payments for review.
Business Integration: It acts as a central hub for business banking, reducing the need for manual data entry.

Takeaways

Efficiency Play: For small to medium enterprises (SMEs), Mercury represents the "current state of the art" in AI-assisted business operations.
Note on Risk: As mentioned in the transcript, Mercury is a fintech company, not a bank. Banking services are provided through partners like Choice Financial Group and Column N.A. (Members FDIC).

Specialized AI Training Techniques

The podcast identifies specific technical shifts that will determine which AI labs or startups "win" the next phase of development.

Dreaming / Test-Time Training: This is described as a "fourth axis of scaling" (alongside pre-training, RL, and inference). It involves the model spending compute to build its own RL environments to rehearse skills.
Sample Efficiency: The "holy grail" of AI investment. Humans can learn from one example; AI currently needs millions. Any company that makes a breakthrough in Sample Efficiency will have a massive cost advantage.

Takeaways

Bullish Sentiment on "Agentic" AI: The speaker is highly optimistic that by 2027-2028, AI will transition from a "grad student who never had an internship" to a "veteran worker" that learns from every interaction.
Key Metric for Investors: When evaluating AI startups, look beyond "context window size" and ask about "Weight Update Efficiency." Can the model learn from a single user's feedback and apply it to future sessions permanently?

Major Tech Players Mentioned

Amazon (AMZN): Mentioned as a barrier to AI training due to bot-detection (anti-scraping) measures, highlighting the tension between web platforms and AI trainers.
SpaceX: Used as a benchmark for high-level general intelligence; the goal is an AI that could theoretically build a company of this complexity if given the capital.
Mercury: Highlighted for its automated financial workflows.

Ask about this postAnswers are grounded in this post's content.

Episode Description

Read it here. Thanks to Mercury for sponsoring this essay. Mercury has automated basically my entire bill pay process for my business. I just give contractors a dedicated email address, and when they send an invoice, Mercury automatically creates a draft payment for me to review. I no longer have to hunt through my inbox for invoices or deal with messy spreadsheets to track my bills. Mercury handles it all. Learn more at mercury.com Timestamps: (00:00:00) – The big research bet the labs are making (00:02:12) – Grindability is just as important as verifiability (00:06:10) – Will RLVR alone generalize? (00:08:41) – Getting the learning back to the weights (00:15:22) – Dreaming (00:17:23) – What 2027 looks like Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

About Dwarkesh Podcast

Dwarkesh Podcast

By Dwarkesh Patel

Deeply researched interviews <br/><br/><a href="https://www.dwarkesh.com?utm_medium=podcast">www.dwarkesh.com</a>