Xiaomi Releases MiMo-V2.5-Pro: 1T-Parameter Open MoE Matches Frontier Coding Models

Xiaomi released MiMo-V2.5-Pro on April 22, 2026, a 1.02-trillion-parameter Mixture-of-Experts model with 42 billion active parameters that matches frontier coding and agentic benchmarks while costing a fraction of comparable closed models. The release also unifies reasoning and multimodal abilities into a single open-weights model with a 1-million-token context window.

Intermediate

Stylized visualization of a sparse mixture-of-experts model with a central active core surrounded by thousands of dim parameter nodes and a few illuminated expert nodes
Illustration generated by AI

What Was Released

MiMo-V2.5-Pro is the flagship of Xiaomi’s MiMo line, succeeding MiMo-V2-Pro and the smaller MiMo-V2-Flash. The model is fully open-sourced under a permissive license, with weights, tokenizer, and model card published on Hugging Face. Xiaomi simultaneously released MiMo-V2.5, a non-Pro variant, and rolled the new model out across its API platform and AI Studio at unchanged pricing — $1 per million input tokens and $3 per million output tokens.

The headline architectural numbers: 1.02 trillion total parameters, 42 billion active per token, FP8 (E4M3) mixed precision, and a hybrid attention design that interleaves local sliding-window and global attention at a 6:1 ratio. The model was trained on 27 trillion tokens, with the base version supporting 256K context and the Pro release extended to 1M.

Benchmarks and Token Efficiency

On SWE-bench Pro — a coding benchmark where models fix real bugs in actual startup codebases — MiMo-V2.5-Pro resolves 57.2% of tasks, putting it in the same neighborhood as Claude Opus 4.6 and Gemini 3.1 Pro. On τ3-Bench it scores 72.9, and on Xiaomi’s internal MiMo Coding Bench it leads the open-weights field. On reasoning, it scores 48.0% on Humanity’s Last Exam, behind GPT-5.4’s 58.7% but competitive among open models.

The more interesting story is token economy. On ClawEval, MiMo-V2.5-Pro hits 64% Pass³ using roughly 70,000 tokens per trajectory — 40 to 60 percent fewer tokens than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 at comparable capability levels. Xiaomi reports the model uses 42% fewer tokens than Kimi K2.6 at equivalent scores.

MiMo V2.5 Pro token efficiency and benchmark comparison chart from Xiaomi
Image credit: Xiaomi MiMo

Long-Horizon Agentic Work

The most concrete demonstrations come from long-running agentic tasks. Xiaomi reports the model completed a full SysY compiler implementation in Rust, passing all 233 unit tests across 672 tool calls over 4.3 hours. In another run, it built an 8,192-line desktop video editing application across 1,868 tool calls and 11.5 hours. A third demo had the model optimize an analog FVF-LDO circuit through closed-loop simulation in roughly an hour — a task usually framed as graduate-level EDA work.

Xiaomi attributes the durability to what it calls “harness awareness” — the model actively manages its tool environment, context, and intermediate state rather than executing instructions in isolation. The model is compatible with Claude Code, OpenCode, and Kilo agent harnesses out of the box.

What This Means

The pattern emerging across Chinese open-weights labs in 2026 — DeepSeek, Kimi, Qwen, and now Xiaomi — is that frontier-equivalent capability at significantly lower token cost is increasingly available without an API contract. MiMo-V2.5-Pro’s combination of open weights, 1M context, native multimodal, and aggressive token efficiency makes it a credible substitute for closed coding agents in research and self-hosted production settings, particularly where long-horizon autonomy matters more than raw single-turn reasoning.

Sources