MiniMax M2.5: Frontier AI Performance at a Fraction of the Cost

February 24, 2026Provided by Utku Ege Tuluk

MiniMax releases M2.5, a frontier-class agentic AI model that matches the performance of leading proprietary systems at one-tenth to one-twentieth of their cost. Announced on February 12, 2026, M2.5 sets new benchmarks across coding, tool use, and real-world office productivity — while shipping as open weights under a modified MIT license.

MiniMax M2.5 model hero image — Image credit: MiniMax

What Is MiniMax M2.5?

MiniMax M2.5 is the latest generation of Shanghai-based MiniMax’s flagship agentic reasoning model. It is a Mixture-of-Experts (MoE) architecture with 229 billion total parameters and 10 billion active parameters, paired with a 200,000-token context window. The model is designed specifically for complex, multi-step agentic tasks — autonomous workflows that require tool use, web search, code execution, and document editing in sequence.

M2.5 ships in two API variants: a standard version running at 50 tokens per second and an M2.5-Lightning variant at 100 tokens per second — double the inference speed of comparable frontier models. Pricing starts at $0.30 per million input tokens and $1.10 per million output tokens for Lightning, placing it far below the cost of models such as Claude Opus 4.6, Gemini 3 Pro, and GPT-5.

Benchmark Performance

M2.5 achieves competitive results across the leading agentic benchmarks:

SWE-Bench Verified: 80.2% — within 0.6 percentage points of Claude Opus 4.6, and 37% faster than MiniMax’s own M2.1
Multi-SWE-Bench: 51.3% — ranking first across all evaluated models
BrowseComp: 76.3% — measuring web search and context management across long tasks
BFCL (Berkeley Function Calling Leaderboard): 76.8% — leading Claude Opus 4.6 by over 13 percentage points in multi-turn tool calling
Office work win rate: 59.0% average against competing models on real-world productivity tasks

Image credit: MiniMax — Coding benchmark comparison

MiniMax M2.5 coding performance benchmark comparison against frontier models — Image credit: MiniMax — Coding benchmark comparison

In terms of efficiency, M2.5 uses approximately 3.52 million tokens per SWE-Bench task (down from 3.72M in M2.1) and completes tasks in roughly 22.8 minutes — on par with Claude Opus 4.6. It also reduces search rounds by about 20% compared to M2.1 while achieving better outcomes.

The Forge Reinforcement Learning Framework

Underlying M2.5 is MiniMax’s proprietary training system called Forge, a reinforcement learning framework that delivers a 40x training speedup compared to conventional RL pipelines. Forge enables the model to be trained across 10+ programming languages in over 200,000 real-world software environments, building robust generalization for tasks far outside standard benchmarks.

Image credit: MiniMax — Forge RL Framework

Diagram of MiniMax Forge reinforcement learning training framework — Image credit: MiniMax — Forge RL Framework

Forge is built around task decomposition and reward design: the model is trained to break complex goals into sub-tasks and reason about which tools and search strategies to employ at each stage. The result is a model that can operate continuously — MiniMax notes that running M2.5-Lightning at full speed for one hour costs approximately $1.

Deployment and Ecosystem

M2.5 is fully integrated into the MiniMax Agent platform, which includes a suite of standardized “Office Skills” for tasks in Microsoft Word, PowerPoint, and Excel. Over 10,000 custom agent configurations — called “Experts” — have already been built on the platform by users. The model is also available via the MiniMax API and through third-party providers including Fireworks, Novita, and GMI Cloud.

Weights are released on Hugging Face under a modified MIT License that requires commercial attribution. Unlike many Chinese frontier model releases, MiniMax is providing open access to the full model weights — not just an API.

What This Means

MiniMax M2.5 represents a significant moment in the commoditization of frontier AI. Matching Claude Opus 4.6’s coding ability at a fraction of the cost — with faster inference and open weights — makes a compelling case that state-of-the-art agentic performance no longer requires proprietary, closed models. For developers and researchers, this opens up previously cost-prohibitive use cases in autonomous software engineering, enterprise productivity, and research automation.

The release also reinforces a broader pattern: Chinese AI labs are increasingly closing or exceeding the performance gap with US frontier models, not through parameter brute-force, but through architectural efficiency and reinforcement learning innovation.

Related Coverage

MiniMax M1: The World’s First Open-Weight, Million-Token Context AI Model — our coverage of MiniMax’s previous flagship reasoning model with 1M-token context