Meta: Can afford a trillion in computing power but can't keep the key people

PANews

2026-02-28 13:18:37

Written by: Ada, Deep Tide TechFlow

Pang Ruiming hadn’t even settled into his seat at Meta before leaving.

In July 2025, Zuckerberg secured this highly sought-after Chinese AI infrastructure engineer from Apple with a multi-year compensation package worth over $200 million. Pang was assigned to Meta’s Superintelligence Lab to build the infrastructure for the next-generation AI models.

Seven months later, OpenAI poached him.

According to The Information, OpenAI launched a months-long recruitment campaign targeting Pang Ruiming. Although he told colleagues he was “very happy working at Meta,” he ultimately chose to leave. Bloomberg reported that his compensation at Meta was tied to milestones, and leaving early meant forfeiting most of his unvested equity.

$200 million can’t buy seven months of loyalty.

This is not just a simple job switch story.

One person’s departure, a signal from many

Pang Ruiming is not the first to leave.

Last week, Mat Velloso, head of Meta’s Superintelligence Lab developer platform, also announced his departure. He joined Meta less than 8 months ago after leaving Google DeepMind in July 2024. Going further back, in November 2025, Yann LeCun, a Turing Award winner and Meta’s Chief AI Scientist who had been at Meta for 12 years, announced his departure to pursue his “world model” vision. Geoffrey Hinton’s core disciple and Meta’s VP of Generative AI Research, Russ Salakhutdinov, also recently announced his departure.

To understand the talent drain at Meta AI, we must first grasp how damaging Llama 4 really is.

In April 2025, Meta proudly released the Llama 4 series, including Scout and Maverick models. Official papers boasted impressive data, claiming dominance over GPT-4.5 and Claude Sonnet 3.7 on core benchmarks like MATH-500 and GPQA Diamond.

However, this flagship model carrying Meta’s ambitions was quickly exposed in third-party blind tests within the open-source community, revealing a stark gap between its generalization and reasoning abilities versus the hype. Facing strong community skepticism, Chief AI Scientist Yann LeCun finally admitted that during testing, “different model versions were used for different test sets to optimize the final scores.”

In rigorous AI academia and engineering circles, this crossed an unforgivable red line. In other words, Meta trained Llama 4 to be a “test-taking machine” that only excels at past exam questions, rather than a truly cutting-edge intelligent model. It’s like giving math exams to a math prodigy and programming tests to a programming prodigy—each looks impressive individually, but they are not the same model.

This practice is called “cherry-picking” in AI academia, and “cheating” in exam-oriented education.

For Meta, which has long positioned itself as an “open-source lighthouse,” this scandal destroyed its most valuable trust asset within the developer ecosystem. The direct consequence was that Zuckerberg lost confidence in the original GenAI team’s engineering standards, leading to a series of appointments of external executives and sidelining core infrastructure departments.

He spent between $14.3 billion and $15 billion acquiring a 49% stake in data annotation company Scale AI, and parachuted 28-year-old Scale CEO Alexandr Wang as Meta’s Chief AI Officer, establishing the Meta Superintelligence Lab (MSL). In this new structure, LeCun, a Turing Award winner, now reports to the young Wang. In October, Meta cut about 600 positions at MSL, including members of FAIR, the research division LeCun founded.

Meanwhile, the flagship model originally scheduled for release in summer 2025, Llama 4 Behemoth, was repeatedly delayed—from summer to fall, and ultimately indefinitely shelved.

Meta shifted focus to developing next-generation models codenamed “Avocado” (text) and “Mango” (image/video). Reports indicate that Avocado aims to compete with GPT-5 and Gemini 3 Ultra. Originally scheduled for late 2025, it was postponed to Q1 2026 due to underperformance in testing and training optimization. Meta is also considering a closed-source release, abandoning its traditional open-source approach for the Llama series.

Meta made two fatal errors in AI modeling: first, faking benchmark results, which destroyed trust in the developer community; second, forcing foundational research teams like FAIR into product organizations driven by quarterly KPIs. These two issues are the root causes of the current talent exodus.

Self-developed chips: another broken leg

Talent is running, and chips are also problematic.

According to The Information, Meta recently canceled its most advanced internal AI training chip project.

Meta’s self-developed chip plan is called MTIA (Meta Training and Inference Accelerator). The initial roadmap was ambitious: MTIA v4 (“Santa Barbara”), v5 (“Olympus”), and v6 (“Universal Core”) were planned for delivery between 2026 and 2028. Olympus was designed as Meta’s first chip based on 2nm chiplet architecture, aiming to cover high-end model training and real-time inference, ultimately replacing Nvidia in Meta’s training clusters.

Now, this cutting-edge training chip project has been canceled.

Meta has made some progress—its inference chips, codenamed “Iris,” have been deployed at scale in Meta’s data centers, mainly for Facebook Reels and Instagram recommendation systems, reportedly reducing overall costs by 40-44%. But inference and training are different beasts. Inference runs models; training develops models. Meta can produce inference chips but cannot yet build training chips capable of competing directly with Nvidia.

This isn’t the first time. In 2022, Meta attempted to develop inference chips but failed in small-scale deployment and abandoned the project, turning instead to Nvidia for large orders.

The setback in self-developed chips has accelerated Meta’s reliance on external procurement.

$135 billion panic buying

In January 2026, Meta announced a capital expenditure budget of $115 billion to $135 billion—almost double last year’s $72.2 billion. The majority of this will be spent on chips.

Within ten days, three major deals were finalized:

On February 17, Meta signed a multi-year, multi-generation strategic partnership with Nvidia. Meta will deploy “millions” of Nvidia Blackwell and next-gen Vera Rubin GPUs, plus Grace CPUs. Analysts estimate the deal is worth hundreds of billions of dollars, making Meta the first supercomputing customer to deploy Nvidia’s Grace CPUs at scale.

On February 24, Meta signed a $60-100 billion multi-year chip agreement with AMD, purchasing AMD’s latest MI450 series GPUs and sixth-generation EPYC CPUs. As part of the deal, AMD issued Meta warrants for up to 160 million common shares—about 10% of AMD’s stock—at $0.01 per share, vesting in stages based on delivery milestones.

On February 26, The Information reported that Meta signed a multi-billion-dollar multi-year agreement with Google to rent Google Cloud’s TPU chips for training and running its next-generation large language models. The two are also discussing Meta’s direct purchase of TPU deployments starting in 2027.

A social media giant, within ten days, simultaneously placed orders with three chip suppliers totaling potentially over $100 billion.

This is not diversification. It’s panic buying.

Three layers of compute anxiety

Why is Meta in such a rush?

First, self-developed chips are no longer reliable. The most advanced training chip project was canceled, meaning Meta will have to rely on external hardware to meet AI training needs in the foreseeable future. While the inference chips (MTIA Iris) can handle recommendation systems, training cutting-edge models like Avocado (GPT-5 level) requires Nvidia or equivalent hardware.

Second, competitors won’t wait. OpenAI has secured massive resources from Microsoft, SoftBank, and sovereign funds from the UAE. Anthropic has locked in 1 million TPU and Trainium chips from Google and Amazon. Google’s Gemini 3 is fully trained on TPUs. If Meta cannot secure enough compute power, it risks losing its place in the race.

Third, and perhaps most fundamentally, Zuckerberg needs to use “purchasing power” to compensate for a lack of “R&D strength.” The failures of Llama 4, talent drain, and chip setbacks have made Meta’s AI narrative fragile in Wall Street’s eyes. Signing big orders with Nvidia, AMD, and Google signals: “We have money, we’re buying, we’re not giving up.”

Meta’s current strategy is: if software can’t be fixed, then buy hardware; if talent can’t be retained, then buy chips. But AI competition isn’t won by writing checks. Compute power is necessary but not sufficient. Without top-tier model teams and a clear technical roadmap, even the most expensive chips are just costly inventory.

Buyers’ dilemma

Looking back at Meta’s three deals in February, a subtle detail most overlook:

Meta is buying current-generation Nvidia Blackwell and future Vera Rubin GPUs; from AMD, MI450 and upcoming MI455X; and renting Google’s TPU chips, planning to buy them outright next year.

Three different hardware architectures and software ecosystems.

This means Meta must repeatedly switch between Nvidia’s CUDA, AMD’s ROCm, and Google’s XLA/JAX. While multi-supplier strategies can diversify supply chain risks and lower hardware costs, they exponentially increase engineering complexity.

This is Meta’s most critical weakness: training a trillion-parameter model efficiently across three vastly different low-level programming models and hardware architectures requires more than engineers familiar with CUDA; it demands architects capable of building cross-platform training frameworks from scratch.

Such talent likely numbers fewer than 100 worldwide. Pang Ruiming is one of them.

Spending $100 billion to acquire the world’s most complex hardware setup, while losing the brains capable of harnessing it—that’s the most surreal scene in Zuckerberg’s high-stakes gamble.

Zuckerberg’s gamble

Zooming out, Zuckerberg’s AI strategy over the past 18 months closely resembles his all-in approach to the Metaverse:

Spot a trend, pour in money, hire aggressively, face setbacks, pivot strategy, then pour in more money.

From 2021 to 2023 was the Metaverse—losing hundreds of millions annually, with stock dropping from $380 to $88. From 2024 to 2026, it’s AI—again, reckless spending, frequent reorganizations, and the narrative of “trust me, I have a vision.”

The difference is, AI’s prospects are much more tangible than the Metaverse. Meta has cash to burn, and its advertising business generates abundant cash flow—Q4 2025 revenue hit $59.9 billion, up 24%.

The problem: money can buy chips, compute, and even the seats, but not the people who stay.

Pang Ruiming chose OpenAI; Russ Salakhutdinov left; LeCun started his own venture.

Zuckerberg’s bet now is: if he can buy enough chips, build enough data centers, and spend enough money, he can find or train the talent to use these resources.

This bet might pay off. Meta is one of the wealthiest tech companies globally, with over $100 billion in operating cash flow as its strongest moat. From OpenAI to Anthropic, from Google to competitors, Meta continues to poach talent. According to QuantumBit, nearly 40% of Meta’s Superintelligence team of 44 come from OpenAI.

But the brutal truth of AI competition is: publicly available compute reserves, talent lists, and model benchmarks are all transparent. The Llama 4 benchmark scandal proved that in this industry, you can’t sustain a lead with just PPTs and PR.

Ultimately, the market only cares about one thing: is your model good enough?

Position in the food chain

As AI arms race enters 2026, the ranking of the food chain is becoming clearer:

At the top are OpenAI and Google. OpenAI has the strongest models, largest user base, and most aggressive funding. Google has full vertical integration—self-developed chips, models, and cloud infrastructure. Anthropic follows closely, leveraging Claude’s product strength and dual compute supply from Google and Amazon, firmly in the first tier.

Meta? It has spent the most, signed the biggest chip contracts, and reorganized most frequently, but so far, it has yet to produce a front-line model convincing enough for the market.

Meta’s AI story is somewhat like Yahoo in 2005. Yahoo was also one of the wealthiest internet companies, rushing acquisitions and spending wildly, yet unable to produce a search engine on par with Google. Money isn’t everything. Zuckerberg needs to clarify what Meta’s AI goal really is, rather than chasing every hot trend.

Of course, it’s too early to write Meta’s obituary. With 3.58 billion monthly active users, $59.9 billion quarterly revenue, and the world’s largest social data set, Meta’s assets are hard for any competitor to replicate.

If the next-generation model codenamed Avocado can be delivered on time in 2026 and re-enter the top tier, Zuckerberg’s spending and restructuring will be seen as strategic resilience. But if once again it falls short, the $135 billion spent will only produce a warehouse full of powered silicon wafers.

After all, Silicon Valley’s AI arms race has never lacked big spenders. What’s missing is the talent to turn that compute into the future.

View Original

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Comment

0/400

No comments