The data black hole at the center of AI
The data black hole at the center of AI
Podcast11 min 57 sec
Listen to Episode
Note: AI-generated summary based on third-party content. Not financial advice. Read more.
Quick Insights

Investors should prioritize exposure to the Data Preparation and RLHF (Reinforcement Learning from Human Feedback) sectors, as companies like Surge AI and Mercor are essential "picks and shovels" for AI labs. While open-source models are rapidly closing the gap with frontier models, the primary investment moat remains proprietary, expert-level human datasets rather than just software architecture. In the autonomous vehicle and robotics space, Tesla and Waymo are the high-conviction plays as they use massive data "brute force" to overcome current learning efficiency gaps. Despite automation fears, demand for human Software Engineers is projected to increase through 2027, suggesting investors should favor firms that use AI to augment professional productivity rather than replace it. For those tracking fintech, Mercury is a leading private play in AI-native banking through its automated financial management tools.

Detailed Analysis

Data Labeling and RLHF Services (Mercor, Surge AI)

The podcast highlights that AI progress is currently driven by "mind-stretching" amounts of human expert data rather than just algorithmic efficiency. This has created a massive, multi-billion dollar industry for bespoke data generation and Reinforcement Learning from Human Feedback (RLHF).

  • Human Expert Trajectories: Models require hundreds of experts per skill (legal, medical, management consulting) to generate "chain of thought" data and rubrics.
  • Bespoke Data Listings: Mention of platforms like Mercor and Surge AI where specialists (e.g., legal experts, word specialists) are hired specifically to create training data for AI labs.
  • Revenue Growth: This data industry is currently earning billions annually, with the speaker projecting it will soon reach "decambillions" (tens of billions).

Takeaways

  • Investment Opportunity: Look for exposure to companies providing high-quality, human-in-the-loop data labeling and RLHF services. As AI labs move beyond "internet scrapings," the value of verified, expert-level human data is skyrocketing.
  • Sector Growth: The "Data Preparation" sector is a critical bottleneck for frontier models (OpenAI, Google, Anthropic), making these service providers essential "picks and shovels" for the AI gold rush.

Open Source AI Models

The transcript discusses the competitive landscape between closed frontier models (like GPT-4) and open-source alternatives.

  • Catch-up Speed: Open-source models currently lag state-of-the-art frontier models by only about four months (citing Epoch research).
  • Data Distillation: It is relatively easy for laggards to catch up because data can be "distilled" from public APIs.
  • Architectural Moats: The speaker suggests that hyperparameters and architectural tricks are not the primary drivers of progress; if they were, open-source models would struggle much more to keep up.

Takeaways

  • Sentiment: Neutral to Bullish on the viability of open-source AI.
  • Insight: The "moat" for major AI labs may be thinner than previously thought if their primary advantage is simply a proprietary dataset that can eventually be replicated or distilled by competitors.

Robotics and Autonomous Vehicles (Tesla, Waymo)

The discussion touches on the massive data requirements for physical AI, such as humanoid robots and self-driving cars.

  • Sample Inefficiency: AI requires three to four orders of magnitude more data than a human to learn to drive or operate a robot arm.
  • Market Potential: If AI could achieve human-level "sample efficiency" (learning from fewer examples), robotics would become a "decatrillion dollar industry."
  • Current Leaders: Tesla and Waymo are specifically mentioned as the entities currently firehosing massive amounts of data to overcome the efficiency gap in self-driving.

Takeaways

  • Risk Factor: The "Sample Efficiency Gap." Until AI can learn to move and react with the efficiency of a human (who learns to drive in 20 hours), the scaling of robotics will remain incredibly capital-intensive and slow.
  • Long-term Play: Tesla and Waymo are the primary bets in the "data-brute-force" approach to autonomy.

Software Engineering and White-Collar Automation

The speaker addresses the common fear that AI will immediately replace high-level white-collar jobs.

  • Software Engineering: Despite AI's coding capabilities, the speaker bets there will be more demand for human software engineers in 2027 than there is now.
  • Complementary Input: AI is viewed as a tool that increases the productivity of humans rather than a total replacement in the short term, especially for tasks that are "out-of-distribution" (novel problems).
  • Automation Targets: Jobs that are mechanical and predictable (like bank tellers or travel agents) are at higher risk than those requiring complex, open-ended problem-solving.

Takeaways

  • Investment Insight: Be cautious of "AI displacement" narratives that predict immediate collapse in professional services. Instead, look for companies that successfully integrate AI to augment their workforce.
  • Sector Focus: Software engineering remains a high-growth area despite AI, as the "complementary" nature of the tool may drive higher total output and demand.

Mercury (Fintech/Banking)

The transcript includes a specific mention of a fintech platform and its new AI capabilities.

  • Mercury Command: An AI tool built into the Mercury banking platform that allows business owners to manage cash flow, taxes, and transfers via a chat interface.
  • Functionality: It integrates with transaction history and invoices to automate financial tasks.

Takeaways

  • Product Awareness: For business owners or investors tracking the fintech space, Mercury is positioning itself as a leader in "AI-native" banking.
  • Note: Mercury is a fintech company, not an FDIC-insured bank (banking services provided through partners like Choice Financial Group).
Ask about this postAnswers are grounded in this post's content.
Episode Description
Read the transcript here. Thanks to Mercury for sponsoring this essay! Mercury just released a new feature called Command, which gives me AI right in my banking platform. And since I use Mercury to run basically my entire business, Command has access to all the info it needs to get real work done. I can ask it to send invoices, or categorize expenses, or even transfer money… and Command just handles it. Learn more at mercury.com/command Timestamps: (00:00:00) – What is really driving AI progress? (00:03:11) – Comparing human vs AI sample efficiency (00:08:46) – Does sample efficiency matter? Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
About Dwarkesh Podcast
Dwarkesh Podcast

Dwarkesh Podcast

By Dwarkesh Patel

Deeply researched interviews <br/><br/><a href="https://www.dwarkesh.com?utm_medium=podcast">www.dwarkesh.com</a>