On February 11, 2026, Zhipu AI released GLM-5 — its most capable large language model to date — through its Z.ai platform. With 744 billion total parameters and a Mixture-of-Experts (MoE) architecture, GLM-5 claims the top spot among open-weight models on Artificial Analysis and LMArena’s Text Arena, while pricing itself at a fraction of closed-source rivals like Claude Opus 4.6.
GLM-5 roughly doubles the parameter count of its predecessor GLM-4.5 (355B total, 32B active), landing at 744B total parameters with 40B active parameters per token. The model spans 80 layers with 256 experts, of which 8 are active at inference time — a sparsity rate of about 5.9%.
Key architectural decisions include:
Training consumed 28.5 trillion tokens across all stages, with special emphasis on code and reasoning data — including 160 billion unique tokens sourced from issue-PR pairs for software engineering tasks. Context length was progressively extended from 32K to 128K and finally to 200K tokens during mid-training.
GLM-5’s headline achievements span multiple evaluation domains:
One standout claim involves the AA-Omniscience Index, where GLM-5 scored -1 — a 35-point improvement over its predecessor. This metric measures “knowing when to abstain rather than fabricate,” and Zhipu positions GLM-5 as the industry leader in knowledge reliability and low hallucination rates.
Beyond architecture, Zhipu invested heavily in a four-stage post-training pipeline powered by a new reinforcement learning infrastructure called slime:
The agentic training dataset included more than 10,000 verifiable software engineering environments across thousands of repositories spanning 9 programming languages, as well as multi-hop search tasks derived from 2+ million web pages.
Zhipu frames GLM-5 as a deliberate shift from “vibe coding” — ad-hoc, prompt-driven code generation — toward what it calls agentic engineering: autonomous decomposition of complex, long-horizon software tasks with minimal human intervention. The model includes a native “Agent Mode” that can convert raw prompts or source materials directly into professional output files (Word documents, PDFs, spreadsheets).
The practical implication is a model designed to function less like a code autocomplete tool and more like an automated engineering team member capable of navigating multi-step tasks across entire codebases.
GLM-5 is available under an MIT license on Hugging Face (zai-org/GLM-5), with 15 quantized variants. API access through Z.ai is priced at $1.00 per million input tokens and $3.20 per million output tokens — approximately 5× cheaper on input and 8× cheaper on output compared to Claude Opus 4.6. The model is also available via OpenRouter and NVIDIA NIM.
Zhipu acknowledged compute constraints at launch: “Even before the GLM-5 launch, we were pushing every chip to its limit just to serve inference.” The rollout to subscription users will be gradual. Notably, GLM-5 was built with full compatibility for Chinese GPU ecosystems (Huawei Ascend, Moore Threads, Hygon, Cambricon, and others), with W4A8 quantization support — a deliberate hedge against US export restrictions on NVIDIA chips.
Market reaction was immediate: Zhipu’s Hong Kong-listed shares surged roughly 28–34% on the day of the announcement, pushing the company’s valuation to approximately US$23 billion.
