The GLM-5.1 REAP series models are released, offering multiple quantization and pruning variants

robot
Abstract generation in progress

ME News Report, April 22 (UTC+8), recently, based on the 7,440-billion-parameter BF16 model GLM-5.1, the GLM-5.1 REAP series models have been released. The series is generated through REAP pruning and various quantization techniques, designed to adapt to different hardware. REAP pruning evaluates the contribution of each expert in the mixture-of-experts model, removes the least contributing experts, and renumbers the routing gates to minimize quality loss. The series offers multiple core variants, including BF16, NVFP4, GPTQ W4A16, and GGUF formats, with parameter sizes ranging from approximately 285GB to 1125GB, optimized for GPUs or CPUs with architectures such as Hopper, Ampere, and Blackwell. All models are licensed under the MIT License and can be deployed using engines like sglang, vLLM, or llama.cpp. (Source: InFoQ)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin