On January 5th at CES 2026, NVIDIA CEO Jensen Huang commanded the stage in his signature leather jacket for 1.5 hours, condensing the company’s most ambitious vision for the AI era into a single, high-impact presentation. Eight major announcements emerged from this brief window, reshaping the competitive landscape from AI model training toward what NVIDIA sees as the next frontier: cost-efficient inference at scale and physical AI integrated throughout the real world.
The subtext beneath these announcements reflects a fundamental market shift. As model sizes grow 10x annually and inference token counts expand 5x yearly—while per-token costs drop 10x—the computing industry faces a new constraint: inference has become the bottleneck, not training. NVIDIA’s entire Vera Rubin architecture, announced 1.5 hours earlier, is engineered around this reality.
Six Custom Chips in One Rack: The Vera Rubin AI Supercomputer Reimagines Density and Performance
At the hardware core sits the NVIDIA Vera Rubin POD, a purpose-built AI supercomputer integrating six proprietary chips designed to work in lockstep. This co-design philosophy marks a departure from the modular approach that defined prior generations. The flagship system, Vera Rubin NVL72, packs 2 trillion transistors into a single rack, delivering 3.6 EFLOPS of NVFP4 inference performance—a fivefold leap over the previous Blackwell generation.
The architecture breaks down as follows:
Vera CPU: Built around 88 custom Olympus cores with 176 threads of NVIDIA’s Space Multithreading technology. It supports 1.8TB/s of NVLink-C2C bandwidth, enabling seamless CPU-GPU unified memory. System memory scales to 1.5TB—triple that of the Grace CPU—with 1.2TB/s LPDDR5X bandwidth. The CPU doubles data-processing performance and introduces rack-level confidential computing, the first true TEE spanning both CPU and GPU domains.
Rubin GPU: The centerpiece introduces a Transformer engine enabling NVFP4 inference at 50 PFLOPS (5x Blackwell) and NVFP4 training at 35 PFLOPS (3.5x Blackwell). It supports HBM4 memory with 22TB/s bandwidth—2.8x the prior generation—critical for handling massive Mixture-of-Experts (MoE) models. Backward compatibility ensures smooth migrations from existing Blackwell deployments.
NVLink 6 Switch: Per-lane speed jumps to 400Gbps, achieving 3.6TB/s full-interconnect bandwidth per GPU (2x prior generation). Total cross-switch bandwidth reaches 28.8TB/s, with in-network computing delivering 14.4 TFLOPS at FP8 precision. The system operates at 100% liquid cooling, eliminating thermal constraints.
ConnectX-9 SuperNIC: Provides 1.6Tb/s per-GPU bandwidth, fully programmable and software-defined for large-scale AI workloads.
BlueField-4 DPU: An 800Gbps smart NIC equipped with a 64-core Grace CPU and ConnectX-9. It offloads network and storage tasks while enhancing security—delivering 6x the compute performance and 3x memory bandwidth of the prior generation, with 2x faster GPU-to-storage access.
Spectrum-X 102.4T CPO: A co-packaged optical switch using 200Gbps SerDes technology, providing 102.4Tb/s per ASIC. The 512-port high-density configuration (800Gb/s per port) enables the entire system to operate as a unified entity rather than isolated components.
Assembly time has collapsed from two hours to five minutes, while maintenance windows have been eliminated through zero-downtime NVLink Switch architecture. The system’s modular design, now cableless and fanless at the compute tray level, makes it 18x faster to service than previous generations. These operational gains directly translate to reduced data center TCO and improved uptime.
Three Specialized Platforms Attack AI Inference’s Real Constraint: Context Storage and Throughput
While raw compute power improves 5x, inference presents a different problem—one that raw GPU cycles cannot solve alone. NVIDIA introduced three integrated products to address this gap, each targeting a specific bottleneck in the inference-scaled world.
The Spectrum-X Ethernet Co-Packaged Optics: Network as Critical Infrastructure
Traditional network switching consumes massive power and introduces latency that undermines inference performance. The Spectrum-X Ethernet CPO, based on the Spectrum-X architecture with a two-chip design, achieves 5x energy efficiency, 10x higher reliability, and 5x improved application uptime. The 512-port system operates at 800Gb/s per port, scaling to 102.4Tb/s total capacity.
The implications are direct: more tokens processed per day translates to lower cost-per-token, ultimately reducing data center TCO by a factor NVIDIA positions as transformational for hyperscale operators.
The Inference Context Memory Storage Platform: Making KV Caches Practical at Scale
Inference workloads for Agentic AI systems—multi-turn dialogue, Retrieval-Augmented Generation (RAG), and multi-step reasoning—demand persistent context storage. Current systems face a paradox: GPU memory is fast but scarce; network storage is abundant but too slow for short-term context access. The NVIDIA Inference Context Memory Storage Platform bridges this gap by treating context as a first-class data type within the infrastructure.
Accelerated by BlueField-4 and Spectrum-X, this new storage tier connects to GPU clusters via specialized NVLink interconnects. Rather than recomputing key-value caches at every inference step, the system maintains them in optimized storage, achieving 5x better inference performance and 5x energy efficiency for context-heavy workloads. For AI systems evolving from stateless chatbots to stateful agents that reason across millions of tokens, this architectural addition removes a fundamental scaling bottleneck.
NVIDIA is collaborating with storage partners to integrate this platform directly into Rubin-based deployments, positioning it as a core element of turnkey AI infrastructure rather than an afterthought.
DGX SuperPOD (Vera Rubin Edition): The Factory Blueprint for Cost-Optimal Inference
The DGX SuperPOD serves as NVIDIA’s reference architecture for large-scale AI inference deployment. Built on eight DGX Vera Rubin NVL72 systems, it leverages NVLink 6 for vertical network extension, Spectrum-X Ethernet for horizontal scaling, and the Inference Context Memory Storage Platform for context orchestration. The entire stack is managed by NVIDIA Mission Control software.
The result: compared to Blackwell-era infrastructure, training equivalent-scale MoE models requires 1/4 the GPU count, and token costs for large MoE inference drop to 1/10th. For cloud providers and enterprises, this represents a massive economic lever—the same workload processes on vastly fewer GPUs, compounding to multibillion-dollar infrastructure savings at scale.
Nemotron, Blueprints, and the Open-Source Acceleration: Building Multi-Model, Multi-Cloud AI Systems
Concurrent with hardware announcements, NVIDIA announced its largest open-source expansion yet. In 2025, the company contributed 650 open-source models and 250 open-source datasets to Hugging Face, making it the single largest contributor to the platform. Mainstream metrics show open-source model usage has grown 20-fold over the past year, accounting for approximately 25% of all inference tokens.
The company is expanding the Nemotron family with new models: Agentic RAG systems, specialized safety models, and speech models designed for multimodal AI applications. Critically, NVIDIA is shipping these not as isolated models but as components within a larger framework called Blueprints.
Blueprints embodies a key architectural insight Jensen Huang derived from observing Perplexity and early-stage AI agent platforms: production-grade agentic AI is inherently multi-model, multi-cloud, and hybrid-cloud by nature. The framework enables developers to:
Route tasks dynamically: local private models for latency-sensitive workloads, cloud-frontier models for cutting-edge capabilities
Call external APIs and tools seamlessly (email systems, robot control interfaces, calendar services)
These capabilities, once science-fiction abstractions, are now accessible to developers through NVIDIA’s SaaS integration with Blueprints. Similar implementations are appearing on enterprise platforms including ServiceNow and Snowflake, signaling a shift toward systems-level thinking in enterprise AI.
The strategic implication: NVIDIA is simultaneously democratizing access to frontier AI capabilities while entrenching its software ecosystems as the de-facto standard for AI agent construction.
Physical AI: From Simulation to Reality—Alpha-Mayo and the Robotics Inflection Point
After infrastructure and open models, Huang pivoted to what he framed as the defining frontier: physical AI—systems that perceive the physical world, reason about it, and generate actions directly. The transition mirrors AI’s prior epochs: perceptual AI, generative AI, agentic AI. Physical AI represents the stage where intelligence enters embodied systems.
Huang outlined a three-computer architecture for physical AI development:
Training computers (DGX): Build foundational models
Inference computers (embedded chips in robots/vehicles): Execute real-time decisions
Simulation computers (Omniverse): Generate synthetic data and validate physical reasoning
The foundational model anchoring this stack is Cosmos World Foundation Model, which aligns language, images, 3D geometry, and physics laws to support the full pipeline from simulation to live deployment.
Alpha-Mayo: Autonomous Driving as the Beachhead
Autonomous driving represents the first massive-scale deployment window for physical AI. NVIDIA released Alpha-Mayo, a complete system consisting of open-source models, simulation tools, and datasets for Level 4 autonomous driving development.
Alpha-Mayo operates on a reasoning-based paradigm rather than pure end-to-end learned behavior. The 10-billion-parameter model breaks problems into discrete steps, reasons through possibilities, and selects the safest trajectory. This architecture enables vehicles to handle unprecedented edge cases—such as traffic light failures at busy intersections—by applying learned reasoning rather than memorized patterns.
In real-world deployment, the system accepts text prompts, surround-view camera feeds, vehicle state history, and navigation input, outputting both a driving trajectory and a natural-language explanation of the reasoning. This transparency is critical for regulatory certification and passenger trust.
Mercedes-Benz CLA: NVIDIA confirmed that the new Mercedes-Benz CLA, powered by Alpha-Mayo, is already in production and recently earned the highest safety rating from NCAP (New Car Assessment Program). The vehicle offers hands-free highway driving and end-to-end urban autonomous navigation, with enhanced capabilities rolling out in the US market later in 2026. Every line of code, chip, and system component has undergone formal safety certification.
NVIDIA also released:
A subset of datasets used to train Alpha-Mayo for researcher fine-tuning
Alpha-Sim, an open-source simulation framework for evaluating Alpha-Mayo performance
Tools enabling developers to combine real and synthetic data for custom autonomous driving applications
Robotics Partnerships and Industrial Integration
Beyond transportation, NVIDIA announced broad robotics collaborations. Leading companies—Boston Dynamics, Franka Robotics, Surgical, LG Electronics, NEURA, XRLabs, and Logic Robotics—are building systems on NVIDIA Isaac (simulation and development platform) and GR00T (a foundation model for robotics).
Additionally, NVIDIA unveiled a strategic partnership with Siemens. The collaboration integrates NVIDIA CUDA-X libraries, AI models, and Omniverse digital twins into Siemens’ EDA, CAE, and digital-twin tools. This positions physical AI across the entire lifecycle from design and simulation to manufacturing operations and real-world deployment.
The Strategy: Open Source Velocity Meets Hardware Lock-In
The 1.5-hour keynote crystallized NVIDIA’s dual strategy heading into the inference era. On one hand, the company is aggressively open-sourcing models, datasets, and development tools. On the other, it is rendering its hardware, interconnects, and system designs increasingly irreplaceable through deep co-optimization.
This creates a virtuous cycle:
Open-source models and tools accelerate adoption
Broader adoption drives demand for inference scale
Inference scale requires NVIDIA’s specialized hardware to achieve cost-effective performance
As token volumes expand, customers become locked into NVIDIA infrastructure
The system-level design philosophy—from NVLink 6 interconnects to the Inference Context Memory Storage Platform—makes it difficult for competitors to replicate NVIDIA’s total cost of ownership advantage. What looks like NVIDIA “opening up” via Nemotron and Blueprints actually strengthens the company’s moat by making its platform the obvious choice for AI developers seeking both flexibility and performance.
As the AI industry transitions from training-dominant to inference-dominant workloads, this closed-loop strategy of continuous demand expansion, token-cost reduction, and infrastructure lock-in is widening NVIDIA’s economic moat to levels that may prove insurmountable for competitors seeking to gain traction in the inference and physical AI eras.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Jensen Huang Delivered Eight Major Announcements in Just 1.5 Hours, Mapping NVIDIA's Path to Inference and Robotics Dominance
On January 5th at CES 2026, NVIDIA CEO Jensen Huang commanded the stage in his signature leather jacket for 1.5 hours, condensing the company’s most ambitious vision for the AI era into a single, high-impact presentation. Eight major announcements emerged from this brief window, reshaping the competitive landscape from AI model training toward what NVIDIA sees as the next frontier: cost-efficient inference at scale and physical AI integrated throughout the real world.
The subtext beneath these announcements reflects a fundamental market shift. As model sizes grow 10x annually and inference token counts expand 5x yearly—while per-token costs drop 10x—the computing industry faces a new constraint: inference has become the bottleneck, not training. NVIDIA’s entire Vera Rubin architecture, announced 1.5 hours earlier, is engineered around this reality.
Six Custom Chips in One Rack: The Vera Rubin AI Supercomputer Reimagines Density and Performance
At the hardware core sits the NVIDIA Vera Rubin POD, a purpose-built AI supercomputer integrating six proprietary chips designed to work in lockstep. This co-design philosophy marks a departure from the modular approach that defined prior generations. The flagship system, Vera Rubin NVL72, packs 2 trillion transistors into a single rack, delivering 3.6 EFLOPS of NVFP4 inference performance—a fivefold leap over the previous Blackwell generation.
The architecture breaks down as follows:
Vera CPU: Built around 88 custom Olympus cores with 176 threads of NVIDIA’s Space Multithreading technology. It supports 1.8TB/s of NVLink-C2C bandwidth, enabling seamless CPU-GPU unified memory. System memory scales to 1.5TB—triple that of the Grace CPU—with 1.2TB/s LPDDR5X bandwidth. The CPU doubles data-processing performance and introduces rack-level confidential computing, the first true TEE spanning both CPU and GPU domains.
Rubin GPU: The centerpiece introduces a Transformer engine enabling NVFP4 inference at 50 PFLOPS (5x Blackwell) and NVFP4 training at 35 PFLOPS (3.5x Blackwell). It supports HBM4 memory with 22TB/s bandwidth—2.8x the prior generation—critical for handling massive Mixture-of-Experts (MoE) models. Backward compatibility ensures smooth migrations from existing Blackwell deployments.
NVLink 6 Switch: Per-lane speed jumps to 400Gbps, achieving 3.6TB/s full-interconnect bandwidth per GPU (2x prior generation). Total cross-switch bandwidth reaches 28.8TB/s, with in-network computing delivering 14.4 TFLOPS at FP8 precision. The system operates at 100% liquid cooling, eliminating thermal constraints.
ConnectX-9 SuperNIC: Provides 1.6Tb/s per-GPU bandwidth, fully programmable and software-defined for large-scale AI workloads.
BlueField-4 DPU: An 800Gbps smart NIC equipped with a 64-core Grace CPU and ConnectX-9. It offloads network and storage tasks while enhancing security—delivering 6x the compute performance and 3x memory bandwidth of the prior generation, with 2x faster GPU-to-storage access.
Spectrum-X 102.4T CPO: A co-packaged optical switch using 200Gbps SerDes technology, providing 102.4Tb/s per ASIC. The 512-port high-density configuration (800Gb/s per port) enables the entire system to operate as a unified entity rather than isolated components.
Assembly time has collapsed from two hours to five minutes, while maintenance windows have been eliminated through zero-downtime NVLink Switch architecture. The system’s modular design, now cableless and fanless at the compute tray level, makes it 18x faster to service than previous generations. These operational gains directly translate to reduced data center TCO and improved uptime.
Three Specialized Platforms Attack AI Inference’s Real Constraint: Context Storage and Throughput
While raw compute power improves 5x, inference presents a different problem—one that raw GPU cycles cannot solve alone. NVIDIA introduced three integrated products to address this gap, each targeting a specific bottleneck in the inference-scaled world.
The Spectrum-X Ethernet Co-Packaged Optics: Network as Critical Infrastructure
Traditional network switching consumes massive power and introduces latency that undermines inference performance. The Spectrum-X Ethernet CPO, based on the Spectrum-X architecture with a two-chip design, achieves 5x energy efficiency, 10x higher reliability, and 5x improved application uptime. The 512-port system operates at 800Gb/s per port, scaling to 102.4Tb/s total capacity.
The implications are direct: more tokens processed per day translates to lower cost-per-token, ultimately reducing data center TCO by a factor NVIDIA positions as transformational for hyperscale operators.
The Inference Context Memory Storage Platform: Making KV Caches Practical at Scale
Inference workloads for Agentic AI systems—multi-turn dialogue, Retrieval-Augmented Generation (RAG), and multi-step reasoning—demand persistent context storage. Current systems face a paradox: GPU memory is fast but scarce; network storage is abundant but too slow for short-term context access. The NVIDIA Inference Context Memory Storage Platform bridges this gap by treating context as a first-class data type within the infrastructure.
Accelerated by BlueField-4 and Spectrum-X, this new storage tier connects to GPU clusters via specialized NVLink interconnects. Rather than recomputing key-value caches at every inference step, the system maintains them in optimized storage, achieving 5x better inference performance and 5x energy efficiency for context-heavy workloads. For AI systems evolving from stateless chatbots to stateful agents that reason across millions of tokens, this architectural addition removes a fundamental scaling bottleneck.
NVIDIA is collaborating with storage partners to integrate this platform directly into Rubin-based deployments, positioning it as a core element of turnkey AI infrastructure rather than an afterthought.
DGX SuperPOD (Vera Rubin Edition): The Factory Blueprint for Cost-Optimal Inference
The DGX SuperPOD serves as NVIDIA’s reference architecture for large-scale AI inference deployment. Built on eight DGX Vera Rubin NVL72 systems, it leverages NVLink 6 for vertical network extension, Spectrum-X Ethernet for horizontal scaling, and the Inference Context Memory Storage Platform for context orchestration. The entire stack is managed by NVIDIA Mission Control software.
The result: compared to Blackwell-era infrastructure, training equivalent-scale MoE models requires 1/4 the GPU count, and token costs for large MoE inference drop to 1/10th. For cloud providers and enterprises, this represents a massive economic lever—the same workload processes on vastly fewer GPUs, compounding to multibillion-dollar infrastructure savings at scale.
Nemotron, Blueprints, and the Open-Source Acceleration: Building Multi-Model, Multi-Cloud AI Systems
Concurrent with hardware announcements, NVIDIA announced its largest open-source expansion yet. In 2025, the company contributed 650 open-source models and 250 open-source datasets to Hugging Face, making it the single largest contributor to the platform. Mainstream metrics show open-source model usage has grown 20-fold over the past year, accounting for approximately 25% of all inference tokens.
The company is expanding the Nemotron family with new models: Agentic RAG systems, specialized safety models, and speech models designed for multimodal AI applications. Critically, NVIDIA is shipping these not as isolated models but as components within a larger framework called Blueprints.
Blueprints embodies a key architectural insight Jensen Huang derived from observing Perplexity and early-stage AI agent platforms: production-grade agentic AI is inherently multi-model, multi-cloud, and hybrid-cloud by nature. The framework enables developers to:
These capabilities, once science-fiction abstractions, are now accessible to developers through NVIDIA’s SaaS integration with Blueprints. Similar implementations are appearing on enterprise platforms including ServiceNow and Snowflake, signaling a shift toward systems-level thinking in enterprise AI.
The strategic implication: NVIDIA is simultaneously democratizing access to frontier AI capabilities while entrenching its software ecosystems as the de-facto standard for AI agent construction.
Physical AI: From Simulation to Reality—Alpha-Mayo and the Robotics Inflection Point
After infrastructure and open models, Huang pivoted to what he framed as the defining frontier: physical AI—systems that perceive the physical world, reason about it, and generate actions directly. The transition mirrors AI’s prior epochs: perceptual AI, generative AI, agentic AI. Physical AI represents the stage where intelligence enters embodied systems.
Huang outlined a three-computer architecture for physical AI development:
The foundational model anchoring this stack is Cosmos World Foundation Model, which aligns language, images, 3D geometry, and physics laws to support the full pipeline from simulation to live deployment.
Alpha-Mayo: Autonomous Driving as the Beachhead
Autonomous driving represents the first massive-scale deployment window for physical AI. NVIDIA released Alpha-Mayo, a complete system consisting of open-source models, simulation tools, and datasets for Level 4 autonomous driving development.
Alpha-Mayo operates on a reasoning-based paradigm rather than pure end-to-end learned behavior. The 10-billion-parameter model breaks problems into discrete steps, reasons through possibilities, and selects the safest trajectory. This architecture enables vehicles to handle unprecedented edge cases—such as traffic light failures at busy intersections—by applying learned reasoning rather than memorized patterns.
In real-world deployment, the system accepts text prompts, surround-view camera feeds, vehicle state history, and navigation input, outputting both a driving trajectory and a natural-language explanation of the reasoning. This transparency is critical for regulatory certification and passenger trust.
Mercedes-Benz CLA: NVIDIA confirmed that the new Mercedes-Benz CLA, powered by Alpha-Mayo, is already in production and recently earned the highest safety rating from NCAP (New Car Assessment Program). The vehicle offers hands-free highway driving and end-to-end urban autonomous navigation, with enhanced capabilities rolling out in the US market later in 2026. Every line of code, chip, and system component has undergone formal safety certification.
NVIDIA also released:
Robotics Partnerships and Industrial Integration
Beyond transportation, NVIDIA announced broad robotics collaborations. Leading companies—Boston Dynamics, Franka Robotics, Surgical, LG Electronics, NEURA, XRLabs, and Logic Robotics—are building systems on NVIDIA Isaac (simulation and development platform) and GR00T (a foundation model for robotics).
Additionally, NVIDIA unveiled a strategic partnership with Siemens. The collaboration integrates NVIDIA CUDA-X libraries, AI models, and Omniverse digital twins into Siemens’ EDA, CAE, and digital-twin tools. This positions physical AI across the entire lifecycle from design and simulation to manufacturing operations and real-world deployment.
The Strategy: Open Source Velocity Meets Hardware Lock-In
The 1.5-hour keynote crystallized NVIDIA’s dual strategy heading into the inference era. On one hand, the company is aggressively open-sourcing models, datasets, and development tools. On the other, it is rendering its hardware, interconnects, and system designs increasingly irreplaceable through deep co-optimization.
This creates a virtuous cycle:
The system-level design philosophy—from NVLink 6 interconnects to the Inference Context Memory Storage Platform—makes it difficult for competitors to replicate NVIDIA’s total cost of ownership advantage. What looks like NVIDIA “opening up” via Nemotron and Blueprints actually strengthens the company’s moat by making its platform the obvious choice for AI developers seeking both flexibility and performance.
As the AI industry transitions from training-dominant to inference-dominant workloads, this closed-loop strategy of continuous demand expansion, token-cost reduction, and infrastructure lock-in is widening NVIDIA’s economic moat to levels that may prove insurmountable for competitors seeking to gain traction in the inference and physical AI eras.