Alibaba’s Qwen team has released Qwen3-Coder-Next, a new open-weight coding model that achieves remarkable efficiency through an ultra-sparse Mixture-of-Experts (MoE) architecture — activating only 3 billion of its 80 billion total parameters per forward pass. Released in February 2026 under the Apache 2.0 license, the model is purpose-built for coding agents and local development, scoring 70.6% on SWE-Bench Verified while delivering throughput comparable to models with 10–20× more active parameters.
Qwen3-Coder-Next is built on the Qwen3-Next-80B-A3B-Base foundation, which introduces a hybrid attention and MoE design that dramatically reduces inference cost without sacrificing capability. Its 48-layer architecture follows a specific layout: 12 blocks each containing three Gated DeltaNet layers followed by one Gated Attention layer, with each paired to a shared MoE block.
The MoE configuration is notably sparse: out of 512 total experts, only 10 are activated per token (plus 1 shared expert), with a compact expert intermediate dimension of just 512. The attention layers use 16 query heads with only 2 key-value heads (grouped query attention), keeping memory bandwidth low during inference. The model natively supports a 256K token context window (262,144 tokens), making it well-suited for large codebases and long agentic reasoning chains.
| Specification | Detail |
|---|---|
| Total Parameters | 80B (79B non-embedding) |
| Activated Parameters | 3B per token |
| Architecture | Hybrid Attention + MoE |
| Context Length | 262,144 tokens |
| Total Experts | 512 (10 activated + 1 shared) |
| License | Apache 2.0 |
What sets Qwen3-Coder-Next apart from standard instruction-tuned models is its agentic training methodology. Rather than relying on parameter scaling alone, the Qwen team built around 800,000 verifiable tasks paired with executable environments, enabling the model to learn from real environment interactions and reinforcement learning signals.
This training approach focuses on skills critical for real-world agent use: long-horizon reasoning, complex tool usage, and recovery from execution failures. The result is a model that handles dynamic, multi-step coding tasks — not just isolated completions — making it well-suited for integration into CLI and IDE-based agent frameworks such as Claude Code, Qwen Code, Cline, Kilo, and Trae.
On SWE-Bench Verified (using the SWE-Agent scaffold), Qwen3-Coder-Next scores 70.6%, placing it competitively among frontier coding agents. It also achieves 62.8% on SWE-Bench Multilingual and 44.3% on the more demanding SWE-Bench Pro, all while activating just 3B parameters — a fraction of what competing models require.
The practical implications of Qwen3-Coder-Next’s architecture are significant. By activating only 3B parameters per forward pass, the model achieves roughly 10× higher throughput compared to dense models of equivalent quality — a major advantage for developers running local inference or managing agent fleets at scale.
The model is deployable via popular serving frameworks. With SGLang (v0.5.8+) and vLLM (v0.15.0+), Qwen3-Coder-Next can be served with tool-calling support enabled out of the box using the qwen3_coder parser. Four model variants are available on Hugging Face, including quantized (FP8) and base versions, with AMD Instinct GPU support available from day one.
Recommended inference parameters are temperature=1.0, top_p=0.95, and top_k=40. The model operates in non-thinking mode only — there are no <think> reasoning blocks — keeping outputs clean and directly usable by agent scaffolds.
For the open-source AI community, Qwen3-Coder-Next represents an important step toward making frontier-grade coding agents viable on consumer and prosumer hardware. With Apache 2.0 licensing and commercial use permitted for enterprises and indie developers alike, it lowers the barrier to building sophisticated, locally-hosted AI coding workflows.
