134. 和谢晨聊“数据的综述”：AI和机器人数据的历史、版图、金字塔与Recipe | 张小珺Jùn｜商业访谈录

134. 和谢晨聊“数据的综述”：AI和机器人数据的历史、版图、金字塔与Recipe

Podcast2 hr 38 min

Listen to Episode

Note: AI-generated summary based on third-party content. Not financial advice. Read more.

Quick Insights

Investors should prioritize NVIDIA (NVDA) as it cements its role as the essential infrastructure provider for autonomous driving and robotics through its Orin chips and Omniverse simulation platform. Meta Platforms (META) offers a unique data-collection advantage via its Ray-Ban smart glasses, which serve as a "Trojan Horse" to capture the first-person physical world data necessary for training future AI agents. While Tesla (TSLA) remains a leader in data scaling through its massive vehicle fleet, the industry is shifting toward "Embodied AI" and Vision-Language-Action (VLA) models, making specialized robotics labs and "Data Infrastructure" firms high-conviction themes. For exposure to the hardware side of this transition, monitor Chinese manufacturers like Xiaomi and Xpeng, which are leveraging their supply chains to lead in the physical deployment of humanoid robotics. Focus on companies that prioritize "AI Education" and high-quality synthetic data, as the ability to simulate and evaluate complex physical tasks is the next major bottleneck for the industry.

Detailed Analysis

Based on the detailed discussion regarding the evolution of AI data, robotics, and the competitive landscape between the US and China, here are the investment insights and asset analyses.

NVIDIA (NVDA)

The transcript highlights NVIDIA’s pivotal role not just as a hardware provider, but as a central hub for autonomous driving and the "Omniverse" simulation platform.

Takeaways

• Dominance in Autonomous Driving: In 2021, NVIDIA’s Orin chip became the standard for Chinese EV makers (Nio, Xpeng, Li Auto), signaling its victory in the automotive intelligence supply chain. • Simulation as a Moat: NVIDIA’s Omniverse is identified as a critical tool for creating "digital twins" and synthetic data, which is the next frontier for training robots (Embodied AI). • Strategic Shift: The speaker notes that moving to NVIDIA from a specialized firm like Cruise became a "mainstream choice" because NVIDIA provides the foundational infrastructure for the entire industry.

Tesla (TSLA)

Tesla is discussed as the pioneer of the "Data Engine" model, though the speaker suggests the industry is moving toward a more collaborative ecosystem.

Takeaways

• The Data Engine Advantage: Tesla’s strength lies in its massive fleet (100M+ cars) that provides a "free" feedback loop via Shadow Mode, where the car compares its internal logic against actual human driver actions. • Scalability vs. Generalization: While Tesla is the leader in scaling data, the speaker suggests that for general robotics (beyond cars), the "Tesla model" of closed-loop data might be challenged by more open, collaborative data platforms. • Humanoid Robotics: Tesla remains a benchmark for the "brain-body" integration, but faces stiff competition from specialized Chinese firms like Xiaomi in the robotics space.

Scale AI (Private)

Though currently private, Scale AI is frequently cited as the gold standard for the "Data Factory" business model, serving as a proxy for the value of high-quality data labeling.

Takeaways

• From Labeling to Education: The industry is shifting from simple data labeling (ImageNet style) to "AI Education" (RLHF), where experts are paid high premiums ($100+/hour) to provide feedback to models. • The "Secret Sauce": Scale AI’s value isn't just in the data, but in the infrastructure and process used to manage human-in-the-loop feedback at scale. • Investment Theme: Investors should look for companies that provide "Data Infrastructure" rather than just "Data Collection."

Meta Platforms (META)

The discussion touches on Meta’s hardware strategy as a means of data acquisition for the physical world.

Takeaways

• Meta Ray-Ban Success: The Meta Ray-Ban smart glasses are highlighted as a brilliant "Trojan Horse" for data collection. By making a stylish consumer product, Meta can collect "first-person perspective" data of human life, which is essential for training future AI agents. • Hardware as Data Entry: For Meta, hardware is not just a product but a sensor network to feed their world-model AI.

The "Embodied AI" & Robotics Sector

The transcript identifies a massive shift occurring in the last 3–6 months regarding how robots are trained.

Key Players Mentioned:

• OpenAI / DeepMind / Google: Focused on the "General Brain" (VLA - Vision-Language-Action models). • Xiaomi / Xpeng / Li Auto: Chinese players moving from automotive intelligence into humanoid or general-purpose robotics. • Figure / Pi / Physical Intelligence: Emerging "Frontier Labs" in the US focusing on the intersection of big models and physical movement.

Takeaways

• The "VLA" Trend: The next big investment theme is VLA (Vision-Language-Action). This is the "brain" that allows a robot to understand a command, see the environment, and perform a task. • Data Scarcity: While internet text data is "eaten up," physical world data (robotics data) is still a "Blue Ocean." Companies that can solve the Evaluation (Testing) problem for robots will hold immense value. • Simulation vs. Real World: High-quality data that includes "Failure-to-Success" loops (e.g., a robot dropping an item and picking it back up) is significantly more valuable than "perfect" performance data.

Investment Risks & Factors

• The "Evaluation" Bottleneck: The speaker notes that as models get smarter, we need even smarter humans to grade them. If we cannot create effective "exams" for AI, progress will plateau. • Complexity of Physical Scenes: Unlike the digital world, the physical world is "messy." A model that works in a lab may fail in a kitchen. This makes the "General Robot" (Zero-Shot learning) a very high-risk, long-term bet. • US-China Competition: There is a clear divergence in strategy. The US (OpenAI, Google) is leading in "General Intelligence," while China (Xiaomi, EV makers) is leveraging its manufacturing and supply chain to lead in "Physical Deployment."

Ask about this postAnswers are grounded in this post's content.

Episode Description

2026年，除了嘉宾访谈，我们也很希望推出一些内部人士的产业单集。今天就是这样的一次尝试。我们知道数据、算力、算法，是驱动人工智能的三驾马车。今天这集节目我们邀请我们的返场嘉宾、光轮智能创始人兼CEO谢晨，试图通过一集节目完整聊聊这其中一架马车——数据。我们尝试来做一个“数据的产业综述”。大语言模型的数据遇到撞墙的难题，机器人的数据则处于一片荒漠中。数据产业是如何运转的？一些人正在探索的仿真数据、数据金字塔等将如何改变产业格局？ OUTLINE： 00:01:07 寻觅 00:20:09 综述 00:41:39 共生 00:48:30 势力 01:06:56 历程 01:14:45 迹象 01:32:00 对照 01:42:40 金字塔 01:55:31 定价 02:02:50 Recipe 02:17:06 版图 02:28:52 终点 LINKS：我们的播客在小宇宙、Apple Podcast、Spotify等全音频平台播出；我们的视频播客在Bilibili、小红书、视频号、抖音等全视频平台播出；如果你想服用文字版，请搜索我们工作室的公众号：语言即世界language is world。 DISCLAIMER: 本内容不作为投资建议。 CONTACT: xiaojunzhang@lisw.ai Jump into the new world-and explore with us!😉

About 张小珺Jùn｜商业访谈录

张小珺Jùn｜商业访谈录

By 张小珺

努力做中国最优质的科技、商业访谈。张小珺：财经作者，写作中国商业深度报道，范围包括AI、科技巨头、风险投资和知名人物，也是播客《张小珺Jùn | 商业访谈录》制作人。如果我的访谈能陪你走一段孤独的未知的路，也许有一天可以离目的地更近一点，我就很温暖：）