How Intelligent Is AI, Really? | Y Combinator Startup Podcast

How Intelligent Is AI, Really?

143 days ago•Y Combinator Startup Podcast•Y Combinator

Podcast11 min 59 sec

Listen to Episode

Note: AI-generated summary based on third-party content. Not financial advice. Read more.

Quick Insights

The AI investment landscape is shifting towards models with true reasoning ability, measured by new benchmarks like ARC-AGI. Consider an investment in Google (GOOGL) as it is a key competitor directly developing and validating its own frontier AI models. Gain exposure to leading lab OpenAI through its primary partner, Microsoft (MSFT), which is integrating this advanced AI across its products. Amazon (AMZN) provides another strategic position in the AI Arms Race via its major investment in top-tier research lab Anthropic. Monitor the performance of these companies on advanced reasoning benchmarks, as this is the key indicator for identifying long-term technological leadership.

Detailed Analysis

Google (GOOGL)

Google's AI models, Gemini 3 Pro and DeepThink, were mentioned as being at the forefront of AI development.
The company is one of the major "frontier labs" that now uses the advanced ARC-AGI benchmark to measure and report the performance of its new models. This benchmark specifically tests for reasoning and the ability to learn new skills efficiently, which is considered a better measure of true intelligence than older benchmarks.

Takeaways

Google is a key competitor in the race for Artificial General Intelligence (AGI). Its adoption of the challenging ARC-AGI benchmark signals a commitment to developing more sophisticated and human-like AI.
Investors should monitor Google's performance on these next-generation benchmarks, as strong results could indicate a durable technological advantage over competitors who may be focused on less meaningful metrics.

Microsoft (MSFT)

Microsoft is mentioned through its close partnership with and investment in OpenAI, the creator of the GPT series of models.
The transcript notes that early GPT models performed poorly on the ARC-AGI benchmark (scoring only 4-5%), highlighting that pure scale wasn't enough to solve for true intelligence.
However, OpenAI is now listed among the top labs using this benchmark for their latest model releases, indicating significant progress in their model's reasoning capabilities.

Takeaways

Microsoft's investment in OpenAI provides it with direct exposure to one of the most important players in the AI landscape.
The progress that OpenAI is making on difficult reasoning benchmarks is a strong positive indicator for Microsoft's long-term AI strategy and its ability to integrate cutting-edge technology into its products and services.

Amazon (AMZN)

Amazon is mentioned in the context of its investment in Anthropic, another leading AI research lab.
Anthropic and its latest model, Opus 4.5, are recognized as being part of the elite group of "frontier labs" pushing the boundaries of AI and using the ARC-AGI benchmark to validate their progress.

Takeaways

Amazon is strategically positioning itself in the AI race not only through its internal development (AWS) but also by backing other key innovators like Anthropic.
The success and technical validation of Anthropic's models on advanced benchmarks can be seen as a positive development for Amazon, strengthening its overall position and ecosystem in the AI industry.

Investment Theme: The AI Arms Race & True Intelligence

The podcast highlights a critical shift in evaluating AI. The focus is moving away from benchmarks that test memorized knowledge (like MMLU) towards those that measure true reasoning and generalization, such as ARC-AGI.
The discussion points out that a model from the startup 01.AI saw its performance on this benchmark jump from 4% to 21% very quickly, demonstrating a "transformational" leap in reasoning that older, larger models had missed.
The key players in this race are identified as OpenAI (backed by MSFT), Google, XAI (private), and Anthropic (backed by AMZN, GOOGL).

Takeaways

Look beyond the headlines: Investors should be cautious of "vanity metrics." A company announcing the highest score on an older benchmark may not have the best underlying technology.
Reasoning is the new frontier: The ability of an AI to solve novel problems and learn efficiently is the next major hurdle. Companies whose models excel on benchmarks like ARC-AGI may have a significant long-term advantage.
Monitor emerging players: The dramatic performance jump by 01.AI shows that smaller, innovative teams can outmaneuver larger incumbents, making it important to watch the entire competitive landscape.

Investment Theme: AI Compute & Efficiency

The guest states that AI performance can often be boosted by simply "throwing more compute at something," reinforcing the idea that the demand for processing power is a fundamental driver of the AI industry.
A future metric for evaluating AI models will be efficiency, measured by:
- The amount of training data needed to learn a skill.
- The amount of energy consumed to perform a task.
The new ARC-AGI 3 benchmark will compare the number of actions an AI takes to solve a problem versus a human, penalizing "brute force" solutions that require millions of attempts.

Takeaways

Compute remains king: The ongoing need for massive computational power is a strong bullish signal for companies that provide the underlying infrastructure for AI, such as semiconductor designers, manufacturers, and cloud service providers.
Efficiency is the next battleground: As the AI industry matures, the focus will shift from raw power to efficient performance. Companies that can develop AI models that learn faster with less data and consume less energy will have a significant cost and operational advantage. This could benefit both the AI model creators and the hardware companies that enable this efficiency.

Ask about this postAnswers are grounded in this post's content.

Episode Description

ARC-AGI is redefining how to measure progress on the path to AGI - focusing on reasoning, generalization, and adaptability instead of memorization or scale. During this month's NeurIPS 2025 conference, YC's Diana Hu sat down with ARC Prize Foundation President Greg Kamradt to find out why most AI benchmarks fail, how ARC-AGI reveals the limits of today’s models, and why measuring intelligence may be harder than building it.

About Y Combinator Startup Podcast

Y Combinator Startup Podcast

By Y Combinator

We help founders make something people want.