Gate News message, April 24 — OpenAI engineer Clive Chan has raised detailed objections to the hardware recommendations chapter in the V4 technical report, calling it “surprisingly mediocre and error-prone” compared to the acclaimed V3 version. V3’s hardware guidance, which included Q&A sessions that became the most popular discussion topic at the ISCA academic conference, offered specific recommendations aligned with industry interconnect standards. V4, by contrast, is far more vague.
Chan systematically challenged three key recommendations. On power consumption, the report suggests that software optimization allows chips to run compute, storage, and communication at full capacity simultaneously, and recommends that chip manufacturers reserve additional power headroom. Chan argues this is counterproductive: total chip power is constrained by physical process limitations, so reserving more power margin only reduces operating frequency, ultimately decreasing computational performance. Regarding GPU-to-GPU data transfer, the report advocates a pull model—where GPUs actively fetch data—over a push model, citing high notification overhead in push operations. Chan disputes this, contending that pull is actually slower and that improved network adapter capabilities would be preferable. However, the two may be discussing different layers of the issue: the report addresses notification mechanism overhead, while Chan refers to transmission latency itself.
On activation functions, the report recommends replacing SwiGLU with simpler functions to reduce computational burden. Chan sees no merit in this, noting that Sonic MoE has already demonstrated optimal performance using SwiGLU. Chan suspects DeepSeek may have “deliberately weakened this section.”
相關文章