Dylan Patel — Deep Dive on the 3 Big Bottlenecks to Scaling AI Compute | Dwarkesh Podcast

Dylan Patel — Deep Dive on the 3 Big Bottlenecks to Scaling AI Compute

57 days ago•Dwarkesh Podcast•Dwarkesh Patel

Podcast2 hr 31 min

Listen to Episode

Note: AI-generated summary based on third-party content. Not financial advice. Read more.

Quick Insights

NVIDIA (NVDA) remains a top-tier conviction play as it secures 70% of TSMC’s 3nm capacity through 2027, with the transition to Blackwell and Rubin architectures expected to drive massive margin expansion. Investors should view ASML (ASML) as the ultimate industry bottleneck; its monopoly on EUV lithography tools makes its shipping guidance the primary leading indicator for the entire AI sector's growth ceiling. A severe "memory crunch" is expected through 2027, making SK Hynix, Samsung, and Micron (MU) high-conviction buys as AI demand cannibalizes standard chip supply and drives up prices. To play the critical energy constraints facing data centers, look toward infrastructure providers like GE Vernova (GEV), Vertiv (VRT), and Eaton (ETN) that specialize in modular power and cooling solutions. Finally, monitor the massive $600 billion combined CapEx from Microsoft (MSFT), Google (GOOGL), Amazon (AMZN), and Meta (META), as those who locked in long-term compute contracts early now hold a significant margin advantage over latecomers.

Detailed Analysis

NVIDIA (NVDA)

• Dominant Market Position: NVIDIA currently holds the majority of the supply for high-end AI chips. By 2027, they are projected to control approximately 70% of TSMC’s N3 (3-nanometer) wafer capacity. • Product Roadmap: The transcript highlights the transition from Hopper (H100/H200) to Blackwell (B100/B200) and the upcoming Rubin architecture. • Vertical Integration: NVIDIA is expanding beyond GPUs into CPUs (Grace), networking (InfiniBand/Ethernet), and switches to capture more of the data center "spend." • Pricing Power: Due to extreme supply constraints, H100s are sometimes transacting at higher prices today than they did three years ago, defying standard technology depreciation curves.

Takeaways

• Margin Expansion: NVIDIA is successfully "crowding out" competitors by securing long-term contracts with memory and logic vendors, allowing them to maintain high margins even as production costs rise. • Software Advantage: The performance gap between NVIDIA and competitors is often driven by networking and software optimization. For certain models, the performance jump from Hopper to Blackwell is 20x, far exceeding the raw "flops" (mathematical operations) increase.

ASML (ASML)

• The Ultimate Bottleneck: ASML is identified as the "lowest rung" and most critical bottleneck in the global AI scaling race. They are the sole providers of EUV (Extreme Ultraviolet) lithography tools required to make leading-edge chips. • Production Limits: ASML currently produces about 70-80 EUV tools per year, with plans to reach only 100+ by 2030. • Complexity Risk: An EUV tool costs $300M–$400M and contains over 10,000 components from specialized suppliers like Carl Zeiss (optics). The supply chain is "artisanal" and cannot be snapped into higher production volumes quickly.

Takeaways

• Investment Stability: ASML has historically not raised prices as aggressively as NVIDIA, despite having a total monopoly. This suggests a more stable, long-term value play compared to the volatile GPU market. • Capacity Ceiling: Global AI compute (measured in gigawatts) is mathematically capped by how many EUV tools ASML can ship. Investors should watch ASML's shipping guidance as the primary indicator for total AI sector growth.

Memory Manufacturers (SK Hynix, Samsung, Micron)

• The Memory Crunch: AI models require massive amounts of HBM (High Bandwidth Memory). HBM is roughly 3x to 4x less efficient in terms of "bits per wafer" than standard DRAM, meaning it consumes a disproportionate amount of factory capacity. • Capex Shift: It is estimated that by 2026, 30% of Big Tech’s AI CapEx will be spent specifically on memory. • Underinvestment: Memory vendors lost money in 2023 and stopped building new factories (fabs). Since a fab takes two years to build, a severe supply shortage is expected through 2026/2027.

Takeaways

• Bullish Sentiment: The "memory crunch" is a high-conviction theme. As AI models move toward "long context" (remembering more information), the demand for memory scales faster than the demand for raw processing power. • Consumer Impact: High HBM demand is cannibalizing the supply for smartphones and PCs. Expect prices for consumer electronics (iPhones, laptops) to rise by $100–$250 as manufacturers pass on memory costs.

Hyperscalers & AI Labs (Microsoft, Google, Amazon, Meta, OpenAI, Anthropic)

• Massive CapEx: The "Big Four" (Amazon, Meta, Google, Microsoft) have a combined forecasted CapEx of $600 billion. • Commitment Issues: Anthropic is noted for being more conservative in its compute orders compared to OpenAI, which has signed "crazy" long-term deals. This has left Anthropic scrambling for "spot" compute at 50% markups. • Revenue Inflection: Anthropic’s revenue reportedly jumped from $4B to $6B ARR in just one month, suggesting the ROI on AI compute is currently very high.

Takeaways

• First-Mover Advantage: Companies like OpenAI that locked in 5-year compute contracts two years ago now have a massive margin advantage over latecomers who must pay current market rates. • Sovereign AI: Google and Amazon are increasingly "AGI-pilled," investing directly in energy (turbines, land, power agreements) to bypass utility bottlenecks.

Energy & Infrastructure (GE Vernova, Vertiv, Eaton)

• Behind-the-Meter Power: To avoid slow utility permitting, companies are building their own power plants at data center sites using gas turbines, fuel cells (Bloom Energy), and even ship engines. • Supply Chain Diversification: While turbines are a bottleneck, the industry is pivoting to "aeroderivatives" (airplane engines modified for power) and massive battery arrays.

Takeaways

• Infrastructure Opportunity: The "picks and shovels" of the power grid—transformers, cooling systems, and modular data center "skids"—are essential as data centers move from 100 kilowatts per rack to 1 megawatt per rack. • Labor Shortage: High-skilled electricians and plumbers are becoming a major constraint. Companies that can modularize (build sections in a factory and ship them) will win.

China & Geopolitical Risks

• Huawei’s Potential: Huawei is described as a formidable competitor that might have eclipsed NVIDIA if not for US sanctions, due to its vertical integration (chips, networking, and software). • Indigenization: China is working to build a fully independent supply chain. While they may have working EUV tools by 2030, "production hell" (scaling to mass volume) will likely take them several more years to overcome. • Taiwan Risk: A blockade or conflict in Taiwan would reduce global AI compute expansion to "almost zero," as Intel and Samsung cannot currently replace TSMC’s volume.

Takeaways

• Timeline Matters: In a "fast takeoff" AI scenario (1–3 years), the US wins due to current chip leads. In a "slow takeoff" (10+ years), China’s ability to vertically integrate and scale its own supply chain could allow it to overtake the West.

Ask about this postAnswers are grounded in this post's content.

Episode Description

Dylan Patel, founder of SemiAnalysis, provides a deep dive into the 3 big bottlenecks to scaling AI compute: logic, memory, and power. And walks through the economics of labs, hyperscalers, foundries, and fab equipment manufacturers. Learned a ton about every single level of the stack. Enjoy! Watch on YouTube; listen on Apple Podcasts or Spotify. Sponsors * Mercury has already saved me a bunch of time this tax season. Last year, I used Mercury to request W-9s from all the contractors I worked with. Then, when it came time to issue 1099s this year, I literally just clicked a button and Mercury sent them out. Learn more at mercury.com. * Labelbox noticed that even when voice models appear to take interruptions in stride, their performance degrades. To figure out why, they built a new evaluation pipeline called EchoChain. EchoChain diagnoses voice models’ specific failure modes, letting you understand what your model needs to truly handle interruptions. Check it out at labelbox.com/dwarkesh. * Jane Street is basically a research lab with a trading desk attached – and their infrastructure backs this up. They’ve got tens of thousands of GPUs, hundreds of thousands of CPU cores, and exabytes of storage. This is what it takes to find subtle signals hidden deep within noisy market data. If this sounds interesting, you can explore open positions at janestreet.com/dwarkesh. Timestamps (00:00:00) – Why an H100 is worth more today than 3 years ago (00:24:52) – Nvidia secured TSMC allocation early; Google is getting squeezed (00:34:34) – ASML will be the #1 constraint for AI compute scaling by 2030 (00:56:06) – Can’t we just use TSMC’s older fabs? (01:05:56) – When will China outscale the West in semis? (01:16:20) – The enormous incoming memory crunch (01:42:53) – Scaling power in the US will not be a problem (01:55:03) – Space GPUs aren’t happening this decade (02:14:26) – Why aren’t more hedge funds making the AGI trade? (02:18:49) – Will TSMC kick Apple out from N2? (02:24:35) – Robots and Taiwan risk Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

About Dwarkesh Podcast

Dwarkesh Podcast

By Dwarkesh Patel

Deeply researched interviews <br/><br/><a href="https://www.dwarkesh.com?utm_medium=podcast">www.dwarkesh.com</a>