GLM-5: Zhipu AI Ships a 744B Open-Weight Frontier Model

February 24, 2026Provided by Utku Ege Tuluk

On February 11, 2026, Zhipu AI released GLM-5 — its most capable large language model to date — through its Z.ai platform. With 744 billion total parameters and a Mixture-of-Experts (MoE) architecture, GLM-5 claims the top spot among open-weight models on Artificial Analysis and LMArena’s Text Arena, while pricing itself at a fraction of closed-source rivals like Claude Opus 4.6.

Z.ai GLM-5 model interface and branding — Image credit: WinBuzzer

Architecture and Scale

GLM-5 roughly doubles the parameter count of its predecessor GLM-4.5 (355B total, 32B active), landing at 744B total parameters with 40B active parameters per token. The model spans 80 layers with 256 experts, of which 8 are active at inference time — a sparsity rate of about 5.9%.

Key architectural decisions include:

DeepSeek Sparse Attention (DSA): Replaces standard dense attention with dynamic token selection, cutting computation by roughly 1.5–2× on long sequences and enabling a 200K-token context window.
Multi-latent Attention with “Muon Split”: An optimization that improves performance parity with Grouped-Query Attention while reducing memory overhead.
Multi-token Prediction: GLM-5 achieves a speculative decoding acceptance rate of 2.76 tokens per step, outperforming DeepSeek-V3.2’s 2.55.

Training consumed 28.5 trillion tokens across all stages, with special emphasis on code and reasoning data — including 160 billion unique tokens sourced from issue-PR pairs for software engineering tasks. Context length was progressively extended from 32K to 128K and finally to 200K tokens during mid-training.

Benchmark Performance

GLM-5’s headline achievements span multiple evaluation domains:

Artificial Analysis Intelligence Index v4.0: Scored 50 — the first open-weight model to reach this threshold
SWE-bench Verified: 77.8% (vs. Claude Opus 4.5’s 80.9%)
AIME 2026: 92.7
GPQA-Diamond: 86.0
HLE with Tools: 50.4
BrowseComp: 62.0
Vending-Bench 2: $4,432 final balance (#1 among open-source models)
τ²-Bench: 89.7
Terminal-Bench 2.0: 56.2
LMArena Text and Code Arenas: #1 among open-weight models

One standout claim involves the AA-Omniscience Index, where GLM-5 scored -1 — a 35-point improvement over its predecessor. This metric measures “knowing when to abstain rather than fabricate,” and Zhipu positions GLM-5 as the industry leader in knowledge reliability and low hallucination rates.

Z.ai chat interface powered by GLM-5 — Image credit: Testing Catalog

Post-Training: The “slime” Framework

Beyond architecture, Zhipu invested heavily in a four-stage post-training pipeline powered by a new reinforcement learning infrastructure called slime:

Reasoning RL: Using GRPO with an “IcePop” technique for mathematical and logical reasoning
Agentic RL: Asynchronous, decoupled infrastructure supporting up to 1,000 concurrent rollouts for long-horizon agentic tasks
General RL: Hybrid reward signals combining rule-based, outcome-model, and generative rewards
On-Policy Cross-Stage Distillation: Prevents capability regression between training stages

The agentic training dataset included more than 10,000 verifiable software engineering environments across thousands of repositories spanning 9 programming languages, as well as multi-hop search tasks derived from 2+ million web pages.

From Vibe Coding to Agentic Engineering

Zhipu frames GLM-5 as a deliberate shift from “vibe coding” — ad-hoc, prompt-driven code generation — toward what it calls agentic engineering: autonomous decomposition of complex, long-horizon software tasks with minimal human intervention. The model includes a native “Agent Mode” that can convert raw prompts or source materials directly into professional output files (Word documents, PDFs, spreadsheets).

The practical implication is a model designed to function less like a code autocomplete tool and more like an automated engineering team member capable of navigating multi-step tasks across entire codebases.

Availability and Pricing

GLM-5 is available under an MIT license on Hugging Face (zai-org/GLM-5), with 15 quantized variants. API access through Z.ai is priced at $1.00 per million input tokens and $3.20 per million output tokens — approximately 5× cheaper on input and 8× cheaper on output compared to Claude Opus 4.6. The model is also available via OpenRouter and NVIDIA NIM.

Zhipu acknowledged compute constraints at launch: “Even before the GLM-5 launch, we were pushing every chip to its limit just to serve inference.” The rollout to subscription users will be gradual. Notably, GLM-5 was built with full compatibility for Chinese GPU ecosystems (Huawei Ascend, Moore Threads, Hygon, Cambricon, and others), with W4A8 quantization support — a deliberate hedge against US export restrictions on NVIDIA chips.

Market reaction was immediate: Zhipu’s Hong Kong-listed shares surged roughly 28–34% on the day of the announcement, pushing the company’s valuation to approximately US$23 billion.

Related Coverage

Inside GLM-4.6: Z.ai’s Latest Breakthrough in Large Language Models — background on the prior generation’s capabilities
What is GLM-Image? — Zhipu’s open-source image generation model