
The shift from training to inference is driving a transition toward hardware-algo co-design, increasing switching costs and reducing model portability across different systems. Specific hardware architectures mentioned as critical to this trend include Nvidia's Hopper, Blackwell (GB300), and Rubin racks, as well as Google's TPU, Amazon's Trainium, and Cerebras. The analysis highlights a growing architectural divergence between American systems optimized for power efficiency and Chinese systems like the Huawei Cloudmatrix 384 and Atlas SuperPoD, which prioritize scale-up domain size over power consumption.