What I Learned Testing GPT-5.5
What I Learned Testing GPT-5.5
Podcast36 min 32 sec
Listen to Episode
Note: AI-generated summary based on third-party content. Not financial advice. Read more.
Quick Insights

Investors should prioritize OpenAI (Private) as it reclaims the lead in the AI race with GPT-5.5, which significantly outperforms competitors in agentic coding and enterprise tasks. While OpenAI is the high-conviction play for software development and general knowledge work, maintain exposure to Anthropic (Private) for its specialized dominance in creative planning and design aesthetics. Look for enterprise opportunities in KPMG and Blitzy, which are successfully leveraging AI agents to drastically accelerate engineering velocity and operational efficiency. The strategic shift toward "AI inference" suggests a long-term investment focus on companies that prioritize intelligence per dollar and efficient chip utilization over raw model size. Be prepared for a "multi-model" subscription environment, as users will likely pay for both OpenAI and Anthropic to bridge the performance gaps found in each platform's specific weaknesses.

Detailed Analysis

OpenAI (Private)

The transcript highlights the release of GPT-5.5 (codenamed "Spud"), positioning it as a major leap forward in OpenAI’s attempt to reclaim the lead in the AI "arms race" against Anthropic.

  • Product Positioning: GPT-5.5 is described as a "knowledge work model" designed for writing, debugging, researching, and operating software across multiple tools.
  • Performance Benchmarks:
    • Scored 82.7 on Terminal Bench 2.0 (agentic coding), significantly beating Anthropic’s Opus 4.7 (69.4%).
    • Ranked #1 on the Artificial Analysis Intelligence Index, breaking a three-way tie with Anthropic and Google.
  • Strategic Shift: OpenAI is moving away from "hype-heavy" PR toward "iterative deployment" and "democratization." Leadership (Sam Altman) emphasized their transition into an "AI inference company," focusing on intelligence per dollar rather than just model size.
  • Codex Integration: OpenAI is heavily pushing Codex as the primary workspace for both developers and knowledge workers, utilizing new "compaction" features to handle long-running tasks (some reported up to 31 hours of continuous operation).

Takeaways

  • Market Share Recapture: Analysts suggest GPT-5.5 and Codex features could lead to a "market share recapture" from Anthropic, potentially boosting OpenAI's private market valuation.
  • Efficiency over Size: The model is rumored to be significantly smaller (2–5 trillion parameters) than Anthropic’s unreleased Mythos (10 trillion), yet performs at a similar level, suggesting higher profit margins on inference.
  • Enterprise Adoption: The 10% jump in accuracy for enterprise content tasks makes this a strong candidate for corporate AI integration.

Anthropic (Private)

The discussion centers on Anthropic’s recent struggles with model "laziness" and the mystery surrounding their unreleased model, Mythos.

  • The "Mythos" Factor: Anthropic has teased a step-change model (Mythos) but withheld it from the public due to "safety" or "compute constraints." Critics argue that a model that isn't public "does not exist" in terms of market impact.
  • Quality Issues: Anthropic recently published a post-mortem admitting to quality issues in Claude Code, confirming user suspicions that the tool had become "dumber" due to implementation errors.
  • Remaining Strengths: Despite the GPT-5.5 launch, reviewers noted that Opus 4.7 still maintains a lead in creative planning, aesthetics, and design taste.

Takeaways

  • Vulnerability: Anthropic is currently facing a "narrative shift" where users are migrating back to OpenAI due to speed and reliability issues.
  • Niche Dominance: Anthropic remains the preferred choice for high-end design and complex strategic planning, but it is losing the "daily driver" status for general coding and tasks.

AI Infrastructure & Software (Investment Themes)

The transcript identifies several companies and themes benefiting from the current AI evolution.

Key Companies Mentioned:

  • KPMG: Highlighted for its "Client Zero" strategy, embedding AI agents across its entire operating model rather than just buying tools.
  • Blitzy: An AI software development lifecycle (SDLC) tool reported to drive 5x engineering velocity for large enterprises (e.g., building payment apps in 6 weeks vs. 13 months).
  • Granola: An AI-powered notepad for meetings that operates without intrusive bots, focusing on post-call workflows and coaching.
  • Mercury: A fintech/banking platform noted for modernizing business banking for startups and founders.

Investment Themes:

  • Agentic Workflows: The focus has shifted from "chatting with AI" to "agents that do work." GPT-5.5’s ability to run for 7–31 hours autonomously marks a transition toward autonomous AI employees.
  • Inference Compute: As noted by OpenAI leadership, the value is shifting from training models to the efficiency of running them (inference). This suggests continued high demand for specialized chips and efficient cloud infrastructure.
  • The "Jagged Frontier": Even the best models have "chinks in the armor" (e.g., GPT-5.5’s poor performance on the "Vending Bench" business simulation). This creates opportunities for multi-model setups where users pay for both OpenAI and Anthropic to cover each other's weaknesses.

Risks & Considerations

  • Cost vs. Value: GPT-5.5 is double the price of its predecessor (GPT-5.4) and 20% more expensive than Opus 4.7. While more efficient, the high per-token cost may be a barrier for high-volume applications.
  • "Dumbification" of Models: The transcript notes that software "harnesses" (the apps around the AI) can sometimes make a smart model perform poorly, as seen with recent Claude Code issues.
  • Diminishing Returns for Average Users: For 99% of basic tasks (email, simple summaries), the difference between GPT-5.4 and 5.5 may be unnoticeable, potentially slowing the upgrade cycle for non-technical consumers.
Ask about this postAnswers are grounded in this post's content.
Episode Description
GPT 5.5 is here, and the first reactions are split between benchmark dominance, coding debates, Anthropic comparisons, and questions about whether the upgrade will feel dramatic to everyday users. NLW breaks down the launch, the “real work” positioning, the Mythos backdrop, and what changed in OpenAI’s communication strategy, then shares what he learned testing GPT 5.5 across writing, coding, strategy, design, spreadsheets, and data analysis. AI Practitioner's Credential Survey - ⁠⁠⁠⁠https://tally.so/r/vGOLr4⁠⁠⁠⁠ Brought to you by: KPMG – Agentic AI is powering a potential $3 trillion productivity shift, and KPMG’s new paper, Agentic AI Untangled, gives leaders a clear framework to decide whether to build, buy, or borrow—download it at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.kpmg.us/Navigate⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Granola - The AI notepad for people in back-to-back meetings. 100% off your first 3 months with code AIDAILY at ⁠⁠⁠⁠⁠⁠⁠⁠http://granola.ai/aidaily⁠⁠⁠⁠⁠⁠⁠⁠ Mercury - Modern banking for business and now personal accounts. Learn more at ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://mercury.com/personal-banking⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Zenflow Work - Agents for knowledge work - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://zenflow.free/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Drata - The agentic trust management platform - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://drata.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Blitzy - Want to accelerate enterprise software development velocity by 5x? ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ AssemblyAI - The best way to build Voice AI apps - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.assemblyai.com/brief⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Robots & Pencils - Cloud-native AI solutions that power results ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://pod.link/1680633614⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Our Newsletter is BACK: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://aidailybrief.beehiiv.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring the show? sponsors@aidailybrief.ai
About The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis
The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

By Nathaniel Whittemore

A daily news analysis show on all things artificial intelligence. NLW looks at AI from multiple angles, from the explosion of creativity brought on by new tools like Midjourney and ChatGPT to the potential disruptions to work and industries as we know them to the great philosophical, ethical and practical questions of advanced general intelligence, alignment and x-risk.