DeepSeek V4 Announces Abandonment of NVIDIA! How Far Has China's AI "Computing Power Independence" Breakthrough Gone?

動區BlockTempo

2026-03-04 07:15:41

DeepSeek announces that its new generation V4 model will fully adopt domestically produced chips, no longer relying on NVIDIA GPUs. From the ZTE incident to the three rounds of chip bans, China’s AI industry is breaking through simultaneously through algorithm optimization, domestic replacements, and token exports.

(Background: Gao talks about DeepSeek: Absolutely not copying ChatGPT, bypassing NVIDIA’s CUDA platform through underlying technology)

(Additional context: FBI and White House investigation! U.S. suspects DeepSeek obtained NVIDIA chips through Singapore front companies)

Eight years ago, ZTE’s core suddenly stopped. On April 16, 2018, a ban issued by the U.S. Department of Commerce’s Bureau of Industry and Security halted ZTE Corporation, a global fourth-largest telecom equipment manufacturer with 80,000 employees and annual revenue over 100 billion, overnight. The ban was simple: for the next seven years, no U.S. company could sell parts, products, software, or technology to ZTE.

Without Qualcomm chips, base stations ceased production. Without Google’s Android license, phones had no usable system. 23 days later, ZTE announced that its main business operations could no longer continue.

However, ZTE eventually survived, at a cost of $1.4 billion.

A $1 billion fine paid in one lump sum; $400 million in escrow in a U.S. bank. Additionally, all executives were replaced, and a U.S. compliance oversight team was brought in. In 2018, ZTE posted a net loss of 7 billion RMB, with revenue dropping 21.4% year-over-year.

ZTE’s then-chairman Yin Yimin wrote in an internal letter: “We are in an industry that is complex and highly dependent on the global supply chain.” At the time, this was a reflection and a helpless acknowledgment.

Eight years later, on February 26, 2026, China’s AI unicorn DeepSeek announced that its upcoming V4 multimodal large model will prioritize deep cooperation with domestic chip manufacturers, achieving the first full-process non-NVIDIA solution from pretraining to fine-tuning.

In other words: We no longer need NVIDIA.

Once the news broke, the market’s first reaction was skepticism. NVIDIA holds over 90% of the global AI training chip market. Abandoning it—does that make business sense?

But behind DeepSeek’s choice lies a bigger issue than business logic: What kind of computational independence does China’s AI truly need?

Many believe that chip bans target hardware. But what really suffocates Chinese AI companies is something called CUDA.

CUDA, short for Compute Unified Device Architecture, is a parallel computing platform and programming model launched by NVIDIA in 2006. It allows developers to directly call NVIDIA GPUs’ computing power to accelerate complex calculations.

Before the era of AI, CUDA was a tool for a few tech enthusiasts. But with the wave of deep learning, CUDA became the foundation of the entire AI industry.

Training large AI models is essentially massive matrix computations—precisely what GPUs excel at.

Thanks to over a decade of early deployment, NVIDIA used CUDA to build a complete toolchain from hardware to application for AI developers worldwide. Today, all major AI frameworks—Google’s TensorFlow, Meta’s PyTorch—are deeply tied to CUDA.

A PhD student specializing in AI starts learning, programming, and experimenting in a CUDA environment from day one. Every line of code they write reinforces NVIDIA’s moat.

By 2025, the CUDA ecosystem has over 4.5 million developers, supporting more than 3,000 GPU-accelerated applications, with over 40,000 companies worldwide using CUDA. This means over 90% of global AI developers are tied into NVIDIA’s ecosystem.

CUDA’s power lies in its flywheel effect: the more developers use it, the more tools, libraries, and code are created, making the ecosystem more vibrant. A thriving ecosystem attracts even more developers. Once spinning, this wheel is almost unstoppable.

As a result, NVIDIA sells the most expensive shovels and defines the only way to mine. Want a different shovel? Sure. But you’ll need to rewrite all the experience, tools, and code accumulated over the past decade by hundreds of thousands of top minds worldwide under this paradigm.

Who bears this cost?

So, when on October 7, 2022, BIS’s first round of controls restricted exports of NVIDIA A100 and H100 chips to China, Chinese AI companies felt a suffocating pinch for the first time. NVIDIA then released “China-specific” A800 and H800 chips, reducing interconnect bandwidth to maintain supply.

But just a year later, on October 17, 2023, a second round of tighter controls banned A800 and H800, with 13 Chinese companies added to the entity list. NVIDIA had to release further crippled H20 chips. By December 2024, the last round of controls during Biden’s term further restricted H20 exports.

Three rounds of controls, layer upon layer.

But this time, the story is very different from the ZTE incident.

Under the bans, everyone thought China’s large-model dreams would end here.

They were wrong. Faced with blockade, Chinese companies did not choose direct confrontation but instead launched a breakout. The first battlefield of this breakout was not chips, but algorithms.

From late 2024 to 2025, Chinese AI companies collectively shifted toward a technical direction: hybrid expert models.

Simply put, they split a huge model into many small experts, activating only the most relevant ones for a task, rather than running the entire model.

DeepSeek’s V3 exemplifies this approach. It has 671 billion parameters, but only 37 billion (about 5.5%) are active during inference. Training used 2,048 NVIDIA H800 GPUs over 58 days, costing $5.576 million. In comparison, estimates for GPT-4 training cost around $78 million—a difference of an order of magnitude.

Algorithmic optimization at this extreme directly impacts cost. DeepSeek’s API prices are as low as $0.028 to $0.28 per million tokens for input, and $0.42 for output. In contrast, GPT-4’s input costs $5, output $15; Claude Opus costs even more—$15 input, $75 output. In dollar terms, DeepSeek is 25 to 75 times cheaper than Claude.

This price gap has a huge impact on the global developer market. By February 2026, on the world’s largest AI model API platform, OpenRouter, Chinese AI models’ weekly call volume surged 127% in three weeks, surpassing the U.S. for the first time. A year earlier, Chinese models accounted for less than 2% of OpenRouter’s market share. A year later, it grew by 421%, approaching 60%.

Behind this data is a subtle structural change: starting in late 2025, mainstream AI applications shifted from chat to agents. In agent scenarios, token consumption per task is 10 to 100 times higher than simple chat. As token consumption skyrockets, price becomes a decisive factor. Chinese models’ extreme price-performance advantage hits this window perfectly.

But lowering inference costs does not solve the fundamental training problem. A large model that cannot be continuously trained and iterated on the latest data will quickly degrade in capability. Training remains the black hole of compute power.

So, where does the “shovel” for training come from?

In Xinghua, Jiangsu, a small city in central China known for stainless steel and health foods, unrelated to AI before, a 148-meter-long domestic computing server production line was built and put into operation in 2025, taking only 180 days from signing to production.

The core of this line are two fully domestic chips: the Loongson 3C6000 processor and the Taichu Yuanqi T100 AI accelerator card. Loongson 3C6000 is fully self-developed from instruction set to microarchitecture. Taichu Yuanqi is derived from the National Supercomputing Center Wuxi and Tsinghua University teams, adopting a heterogeneous many-core architecture.

At full capacity, the line can produce a server every five minutes. The total investment is 1.1 billion RMB, with an annual capacity of 100,000 units.

More importantly, the cluster of these domestically produced chips has begun undertaking real large-model training tasks.

In January 2026, Zhipu AI and Huawei jointly released GLM-Image, the first fully domestically trained state-of-the-art image generation model. In February, China Telecom’s billion-level “Xingchen” large model completed full training on a domestic cluster of 10,000 chips in Shanghai’s Lingang.

These cases prove one thing: domestic chips have already crossed from “usable for inference” to “capable of training.” This is a qualitative leap. Inference only requires running a trained model; training demands processing massive data, complex gradient calculations, and parameter updates—requiring orders of magnitude higher compute power, interconnect bandwidth, and software ecosystem.

The core force behind this is Huawei’s Ascend series chips. By the end of 2025, the Ascend ecosystem had over 4 million developers, more than 3,000 partners, with 43 mainstream large models pre-trained on Ascend, and over 200 open-source models adapted. At the MWC 2026 on March 2, Huawei launched the new generation computing platform SuperPoD for overseas markets.

Ascend 910B’s FP16 performance already rivals NVIDIA A100. Although the gap remains, it has shifted from “unusable” to “usable,” and now toward “good enough.” Ecosystem building cannot wait for perfect chips; it must start at a sufficient stage, using real business needs to drive iterative development of chips and software. ByteDance, Tencent, Baidu are all doubling their use of domestically produced servers in 2026 compared to last year. According to MIIT data, China’s AI computing scale has reached 1590 EFLOPS. 2026 is becoming the inaugural year of domestic compute deployment at scale.

In early 2026, Virginia, hosting a huge amount of global data center traffic, paused approval of new data center projects. Georgia followed, extending the suspension to 2027. Illinois and Michigan also introduced restrictions.

According to IEA data, in 2024, U.S. data centers consumed 183 TWh of electricity, about 4% of the national total. By 2030, this is projected to double to 426 TWh, possibly over 12%. Arm’s CEO predicts that by 2030, AI data centers will consume 20-25% of U.S. electricity.

The U.S. power grid is already strained. The PJM grid covering 13 eastern states faces a 6 GW capacity shortfall. By 2033, the U.S. will face a 175 GW capacity gap—equivalent to the power needs of 130 million households. Wholesale power costs in data center hubs have increased by 267% over five years.

The limit of compute power is energy. And in this dimension, the gap between China and the U.S. is even larger than chips—just in the opposite direction.

China’s annual power generation is 10.4 trillion kWh, the U.S. 4.2 trillion kWh—China produces 2.5 times more. More critically, residential electricity accounts for only 15% of China’s total consumption, while in the U.S., it’s 36%. This means China has far more industrial electricity available for compute infrastructure.

In electricity prices, U.S. AI hubs pay about $0.12–$0.15 per kWh, while industrial electricity in western China is around $0.03—only a quarter to a fifth of U.S. rates.

China’s incremental power generation has already reached seven times that of the U.S.

While the U.S. worries about electricity, China’s AI quietly goes global. But this time, it’s not products or factories—it’s tokens.

Tokens, the smallest unit of information processing in AI models, are becoming a new digital commodity. They are produced in China’s compute factories and transported via submarine cables worldwide.

DeepSeek’s user distribution illustrates this well: 30.7% in China, 13.6% in India, 6.9% in Indonesia, 4.3% in the U.S., 3.2% in France. It supports 37 languages and is popular in emerging markets like Brazil. Over 26,000 companies have accounts, and 3,200 organizations deploy enterprise versions.

In 2025, 58% of new AI startups integrated DeepSeek into their tech stacks. In China, DeepSeek holds 89% of the market share. In other sanctioned countries, market share ranges from 40% to 60%.

This scene resembles a war over industry autonomy forty years ago.

In 1986, under intense U.S. pressure, Japan signed the U.S.-Japan Semiconductor Agreement. The core terms: Japan had to open its semiconductor market, with U.S. chips capturing over 20% of Japan’s market; Japanese semiconductors were forbidden from being exported below cost; and a 100% punitive tariff was imposed on $300 million worth of Japanese chips. The U.S. also blocked Fujitsu’s acquisition of Quick Semiconductor.

At that time, Japan’s semiconductor industry was at its peak. By 1988, Japan controlled 51% of the global semiconductor market, the U.S. only 36.8%. The top ten semiconductor companies worldwide were dominated by Japan: NEC ranked second, Toshiba third, Hitachi fifth, Fujitsu seventh, Mitsubishi eighth, Panasonic ninth. In 1985, Intel lost $173 million in the U.S.-Japan semiconductor battle, nearly going bankrupt.

But after the agreement, everything changed.

The U.S. launched comprehensive suppression through investigations like Section 301, while supporting Korea’s Samsung and Hynix to undercut Japanese market share with lower prices. Japan’s DRAM share plummeted from 80% to 10%. By 2017, Japan’s IC market share was only 7%. Once dominant giants were split, acquired, or exited in continuous losses.

The tragedy of Japan’s semiconductors was that they were content to be the best producer in a global division system dominated by external forces, never building their own independent ecosystem. When the tide receded, they found they had nothing but manufacturing.

Today’s Chinese AI industry faces a similar but fundamentally different crossroads.

The similarity: external pressure is immense. The three rounds of chip controls, layered upon each other, and the CUDA ecosystem barrier remains towering.

The difference: this time, we chose a harder path. From algorithmic extreme optimization, to domestic chips crossing from inference to training, to the 4 million developers in the Ascend ecosystem, to token exports penetrating global markets. Every step is building a kind of independent industry ecosystem that Japan never had.

On February 27, 2026, three performance reports from domestic AI chip companies were released simultaneously.

Cambricon’s revenue surged 453%, achieving full-year profit for the first time. Moore Threads grew 243% but still posted a net loss of 1 billion RMB. MuXi’s revenue increased 121%, with a net loss close to 800 million RMB.

Half is fire, half is sea.

The fire: extreme market hunger. The 95% gap left by Huang Renxun is being filled inch by inch by these domestic companies’ revenue figures. Regardless of performance or ecosystem, the market needs a second choice beyond NVIDIA. This is a rare structural opportunity torn open by geopolitical tensions.

The sea: huge costs in ecosystem building. Every loss is real money spent chasing the CUDA ecosystem—R&D investments, software subsidies, engineers on-site solving compilation issues. These losses are not mismanagement but war taxes paid to build an independent ecosystem.

These three financial reports more honestly record the true face of this compute war than any industry analysis. It’s not a victorious march but a fierce, bloody, head-to-head battlefield.

But the nature of the war has changed. Eight years ago, we discussed whether we could survive. Today, we ask how much it costs to survive.

The cost itself is progress.

View Original

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Comment

0/400

No comments