Mistral Small 3.2: Minor Update, Major Improvements for Local LLMs

June 22, 2025Provided by Utku Ege Tuluk

Mistral AI has quietly rolled out Mistral Small 3.2, a refined successor to its popular Small 3.1 model. Released on June 20, 2025, this update focuses on improving instruction following, reducing repetition errors, and strengthening function-calling robustness, making it an even more reliable choice for developers running large language models locally (simonwillison.net, huggingface.co).

What’s New in Small 3.2

Enhanced Instruction Following: Small 3.2 shows a notable jump in benchmark performance, achieving 65.33% on the Wildbench v2 test suite—up from 55.60% in version 3.1—and improving internal instruction-following accuracy to 84.78% (huggingface.co).
Fewer Repetition Errors: Infinite generations and loops are halved; Small 3.2 reports only 1.29% repetition failures compared to 2.11% in Small 3.1 (huggingface.co).
Stronger Function Calling: The built-in function-calling template has been overhauled for greater robustness, reducing template errors in various integrations (huggingface.co).

Mistral AI recommends running Small 3.2 with a low temperature—around 0.15—to strike the best balance between creativity and reliability. A suggested system prompt reminding the model of its knowledge cutoff (“last updated on 2023-10-01”) is also provided for more consistent outputs (simonwillison.net).

Benchmark Highlights

Metric	Small 3.1	Small 3.2
Wildbench v2 Instruction Following	55.60%	65.33%
Arena Hard v2 Instruction Following	19.56%	43.10%
Internal Accuracy (IF)	82.75%	84.78%
Infinite Generation Rate (Lower Is Better)	2.11%	1.29%
MMLU Pro (5-shot CoT)	66.76%	69.06%
MBPP Plus – Pass@5	74.63%	78.33%

These gains demonstrate that while Small 3.2 is a “minor” bump in version number, its real-world usability—especially for code generation and instruction-driven tasks—sees a clear boost (huggingface.co).

Getting Started

You can find Mistral Small 3.2 on Hugging Face under the mistralai/Mistral-Small-3.2-24B-Instruct-2506 repository:

https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506

The model supports both FP16 and FP8 GGUF formats, making it feasible to run on machines with as little as 16 GB of RAM when using quantized versions (simonwillison.net).

Mistral recommends using vLLM (>=0.9.1) for best performance:

pip install vllm --upgrade
vllm serve mistralai/Mistral-Small-3.2-24B-Instruct-2506 --tokenizer_mode mistral \
  --config_format mistral --load_format mistral --enable-auto-tool-choice \
  --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

Running on GPU in fp16/bf16 requires roughly 55 GB of GPU memory. Alternatively, you can use the transformers library with minimal code changes.

Why It Matters

With its 24 billion parameters, Mistral Small remains one of the most accessible yet capable open-source LLMs, striking a balance between performance and hardware requirements. Version 3.2’s refinements make it an even stronger candidate for:

Local AI assistants
Code generation tools
Research experiments requiring stable instruction adherence

As the open-source community continues to push boundaries, having a dependable, locally runnable model like Mistral Small 3.2 is invaluable for both hobbyists and enterprise teams.