On March 16, 2026, Mistral AI released Mistral Small 4 — a 119-billion-parameter Mixture-of-Experts model that unifies instruction following, reasoning, multimodal understanding, and agentic coding into a single deployment. With only 6 billion active parameters per token (8B including embedding layers), it delivers frontier-class performance at a fraction of the cost and latency of larger models. The model is released under the Apache 2.0 license.
Intermediate
Mistral Small 4 consolidates four previously separate model families into a single architecture:
This means developers no longer need to route requests between specialized models. A single deployment handles general chat, document analysis, code generation, and complex reasoning tasks.
The model uses a granular MoE architecture with 128 experts and 4 active per token, keeping compute costs low while maintaining a large total parameter budget. It supports a 256K-token context window and accepts both text and image inputs.
Compared to its predecessor Mistral Small 3, Small 4 delivers:
On benchmarks, Mistral Small 4 matches or surpasses GPT-OSS 120B across AA LCR, LiveCodeBench, and AIME 2025 — while generating significantly shorter outputs. On AA LCR, Small 4 scores 0.72 with just 1.6K characters of output, where comparable Qwen models need 5.8–6.1K characters for similar scores. On LiveCodeBench, it outperforms GPT-OSS 120B while producing 20% less output.
A standout feature is the reasoning_effort parameter, which lets developers control the depth of reasoning on a per-request basis:
This eliminates the need to maintain separate fast and reasoning model deployments, simplifying infrastructure and reducing operational overhead.
Mistral Small 4 can run on relatively modest hardware for a 119B model. Minimum requirements include 4x NVIDIA HGX H100, 2x HGX H200, or a single DGX B200. The model is compatible with popular serving frameworks including vLLM, llama.cpp, SGLang, and Transformers.
It is available through the Mistral API, AI Studio, Hugging Face, NVIDIA’s build.nvidia.com for free prototyping, and as an NVIDIA NIM container for production deployment.
