Mistral Small 4: Four Models Unified in One Open-Source MoE

On March 16, 2026, Mistral AI released Mistral Small 4 — a 119-billion-parameter Mixture-of-Experts model that unifies instruction following, reasoning, multimodal understanding, and agentic coding into a single deployment. With only 6 billion active parameters per token (8B including embedding layers), it delivers frontier-class performance at a fraction of the cost and latency of larger models. The model is released under the Apache 2.0 license.
Intermediate
Four Models in One
Mistral Small 4 consolidates four previously separate model families into a single architecture:
- Mistral Small — fast instruction following
- Magistral — step-by-step reasoning
- Pixtral — multimodal (text + image) understanding
- Devstral — agentic coding workflows
This means developers no longer need to route requests between specialized models. A single deployment handles general chat, document analysis, code generation, and complex reasoning tasks.
Architecture and Performance
The model uses a granular MoE architecture with 128 experts and 4 active per token, keeping compute costs low while maintaining a large total parameter budget. It supports a 256K-token context window and accepts both text and image inputs.
Compared to its predecessor Mistral Small 3, Small 4 delivers:
- 40% lower end-to-end latency in latency-optimized configurations
- 3x higher throughput (requests per second) in throughput-optimized setups
On benchmarks, Mistral Small 4 matches or surpasses GPT-OSS 120B across AA LCR, LiveCodeBench, and AIME 2025 — while generating significantly shorter outputs. On AA LCR, Small 4 scores 0.72 with just 1.6K characters of output, where comparable Qwen models need 5.8–6.1K characters for similar scores. On LiveCodeBench, it outperforms GPT-OSS 120B while producing 20% less output.
Configurable Reasoning
A standout feature is the reasoning_effort parameter, which lets developers control the depth of reasoning on a per-request basis:
- “none” — fast, lightweight responses comparable to Mistral Small 3.2
- “high” — deep step-by-step reasoning at Magistral-level depth
This eliminates the need to maintain separate fast and reasoning model deployments, simplifying infrastructure and reducing operational overhead.
Self-Hosting and Availability
Mistral Small 4 can run on relatively modest hardware for a 119B model. Minimum requirements include 4x NVIDIA HGX H100, 2x HGX H200, or a single DGX B200. The model is compatible with popular serving frameworks including vLLM, llama.cpp, SGLang, and Transformers.
It is available through the Mistral API, AI Studio, Hugging Face, NVIDIA’s build.nvidia.com for free prototyping, and as an NVIDIA NIM container for production deployment.
Related Coverage
- Mistral Small 3.2: Minor Update, Major Improvements for Local LLMs — the previous generation of Mistral’s small model family
- Magistral-Small-2506: Mistral AI’s Compact Reasoning Powerhouse — the reasoning model now unified into Small 4
- Exploring Devstral Small 1.1 by Mistral AI — the coding model lineage folded into Small 4



沪公网安备31011502017015号