Yifan Zhang Discloses DeepSeek V4 Complete Technical Specs: 1.6T Parameters, 384 Experts with 6 Activations

Gate News message, April 22 — Princeton PhD student Yifan Zhang disclosed complete technical specifications for DeepSeek V4 on X, following a preview on April 19. V4 features 1.6 trillion total parameters and a lightweight variant, V4-Lite, with 285 billion parameters.

The model employs DSA2 attention mechanism, which combines DeepSeek’s previous DSA (DeepSeek Sparse Attention) from V3.2 and NSA (Native Sparse Attention) with 512-dimensional head embeddings, paired with Sparse Multi-Query Attention (MQA) and Sliding Window Attention (SWA). The MoE (Mixture of Experts) layer contains 384 experts with 6 activated per forward pass, utilizing Fused MoE Mega-Kernel. Residual connections employ Hyper-Connections architecture.

Training details revealed for the first time include the use of Muon optimizer (applying Newton-Schulz orthogonalization to momentum updates), a 32K token pre-training context window, and GRPO (Group Relative Policy Optimization) with KL divergence correction during reinforcement learning. The final context window extends to 1 million tokens. The model is text-only.

Zhang is not employed by DeepSeek, and the company has not officially commented on the disclosed information.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

OpenAI Reaches $1 Trillion Pre-IPO Valuation Amid Race with SpaceX and Anthropic

OpenAI nears a $1T implied pre-IPO via on-chain bets; SpaceX and Anthropic target similar valuations as AI infrastructure costs surge, driving subscription revenue while Anthropic faces pricing confusion.

GateNews50m ago

DeepSeek's Valuation Surges Past $20 Billion as Tencent and Alibaba Weigh Investments

DeepSeek seeks >$20B as Tencent/Alibaba discuss investment; Nvidia warns US chip edge could be undermined by Huawei; AI funding continues to surge with Vast Data's $1B round and OpenAI/Anthropic/xAI investments. DeepSeek aims for a valuation above $20 billion amid talks with Tencent and Alibaba, while Nvidia warns that shifting AI models to Huawei chips could erode U.S. lead. The piece also notes a global surge in AI funding, including Vast Data's $1 billion round at a $30 billion valuation and major investments in OpenAI, Anthropic, and xAI.

GateNews3h ago

OpenClaw, Hermes, and SillyTavern Confirmed in GLM Coding Plan Support

Zhipu AI PM Li announces OpenClaw, Hermes, and SillyTavern as supported GLM Coding Plan projects; other tools will be evaluated case-by-case. Do not share credentials or use subscriptions as API access; contact support for error 1313. Zhipu AI product manager Li announced that OpenClaw, Hermes, and SillyTavern are officially supported under the GLM Coding Plan, with other tools evaluated case-by-case. The note cautions against sharing credentials or using subscriptions as API access and directs users with error 1313 to contact support.

GateNews6h ago

Google Cloud CEO: Gemini to Power Apple's Personalized Siri Rollout in 2026

Summary: Gemini will power a personalized Apple Siri in 2026, built on Apple's Foundation Models and Gemini collaboration; Apple tests a chat-like Siri in iOS 27/macOS 27, slated for WWDC 2026. Abstract: Google Cloud's Gemini is set to power a personalized Apple Siri by 2026, blending Gemini with Apple's Foundation Models under a roughly $1 billion collaboration. Apple is testing a redesigned, chat-like Siri in iOS 27/macOS 27, with a Dynamic Island interface and new features, ahead of a WWDC 2026 unveiling on June 8.

GateNews6h ago

SpaceX $60B Cursor Deal Fuels SBF's Pardon Push as FTX's $200K Stake Now Worth $3B

Gate News message, April 22 — SpaceX announced a major partnership with AI coding startup Cursor today, with an option to acquire the company for $60 billion. The deal has given fresh ammunition to Sam Bankman-Fried (SBF), who is currently incarcerated and pushing for a presidential pardon, as it de

GateNews6h ago

Chegg Stock Crashes 99% as AI Disrupts Edtech Market

Summary: Chegg soared during online-education demand, then AI tools disrupted its model, triggering massive layoffs and a collapse below $2, with broader AI-driven shifts hitting crypto miners and fintech firms. Abstract: This article examines Chegg's rise as a pandemic-era edtech darling and its ensuing decline amid the rapid adoption of generative AI, which provides quick answers and undercuts Chegg's value proposition. It documents 2025 layoffs and the stock's plunge toward delisting, and frames Chegg's experience within a broader AI disruption reshaping tech and crypto: Bitcoin miners pivot to AI operations, and AI-native strategies redefine competitiveness in fintech and beyond.

CryptoFrontier6h ago
Comment
0/400
No comments