ME News released the GLM-5.1 and REAP series based on 744B parameter BF16 on April 22. By pruning and quantizing with REAP, removing the lowest contributors based on mixture-of-experts contribution and reordering routing gates, aiming to minimize quality loss. Core variants include BF16, NVFP4, GPTQ W4A16, GGUF, with parameter sizes approximately 285GB–1125GB, optimized for architectures such as Hopper, Ampere, Blackwell. Licensed under MIT, deployable via sglang, vLLM, or llama.cpp.

MeNews

2026-04-22 00:41:49

Abstract generation in progress

ME News Report, April 22 (UTC+8), recently, based on the 7,440-billion-parameter BF16 model GLM-5.1, the GLM-5.1 REAP series models have been released. The series is generated through REAP pruning and various quantization techniques, designed to adapt to different hardware. REAP pruning evaluates the contribution of each expert in the mixture-of-experts model, removes the least contributing experts, and renumbers the routing gates to minimize quality loss. The series offers multiple core variants, including BF16, NVFP4, GPTQ W4A16, and GGUF formats, with parameter sizes ranging from approximately 285GB to 1125GB, optimized for GPUs or CPUs with architectures such as Hopper, Ampere, and Blackwell. All models are licensed under the MIT License and can be deployed using engines like sglang, vLLM, or llama.cpp. (Source: InFoQ)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GatePreIPOsLaunchesWithSpaceX
310.74K Popularity
#
Gate13thAnniversaryLive
896.54K Popularity
#
BitcoinBouncesBack
178.7K Popularity
#
USIranTalksProgress
228.66K Popularity
#
ArbitrumFreezesKelpDAOHackerETH
35.35K Popularity

Sitemap

The GLM-5.1 REAP series models are released, offering multiple quantization and pruning variants

Trending Topics

GatePreIPOsLaunchesWithSpaceX

Gate13thAnniversaryLive

BitcoinBouncesBack

USIranTalksProgress

ArbitrumFreezesKelpDAOHackerETH

Pin