MiniMax M2.7: The First AI Model That Helps Train Itself

MiniMax has released M2.7, the first frontier AI model designed to participate in its own training loop. Announced on March 18, 2026, M2.7 introduces what MiniMax calls “self-evolution” — the model autonomously analyzes its own failure trajectories, modifies its scaffold code, and runs evaluations across 100+ iterative rounds, achieving a 30% performance improvement on internal benchmarks without human intervention. With just 10 billion activated parameters, M2.7 matches models many times its size on software engineering and agent benchmarks, while costing a fraction of the price.

Intermediate

Abstract visualization of a self-referential neural network feedback loop with nested glowing spheres and data streams
Illustration generated by AI

What Is Self-Evolution?

Unlike traditional model development, where humans design reward functions and training pipelines, M2.7 takes on 30–50% of the reinforcement learning research workflow itself. During development, MiniMax allowed the model to autonomously update its own memory, construct dozens of complex skills to facilitate RL experiments, and improve based on the results.

In one documented trial, M2.7 ran entirely autonomously — executing an iterative loop of analyzing failure trajectories, planning changes, modifying scaffold code, and running evaluations for over 100 rounds. The result was a 30% performance improvement on internal evaluation sets. MiniMax describes this as “early echoes of self-evolution,” signaling a shift toward models that actively contribute to their own improvement cycle.

Performance Benchmarks

Despite activating only 10 billion parameters — making it the smallest Tier-1 model — M2.7 delivers frontier-class performance across software engineering, agent workflows, and professional tasks:

  • SWE-Pro: 56.22%, matching GPT-5.3-Codex and nearly reaching Claude Opus 4.6
  • VIBE-Pro (end-to-end project delivery): 55.6%
  • Terminal Bench 2: 57.0%
  • GDPval-AA: ELO 1495, ranking just behind Claude Opus 4.6, Claude Sonnet 4.6, and GPT-5.4
  • MLE Bench Lite: 66.6% average medal rate, second only to Opus 4.6 (75.7%) and GPT-5.4 (71.2%)
  • Skill adherence: 97% compliance across 40+ complex skills exceeding 2,000 tokens each

The model also scores +1 on the AA-Omniscience Index for hallucination resistance — a massive leap from M2.5’s score of −40 — indicating significantly improved factual reliability.

Speed and Pricing

M2.7 runs at 100 tokens per second, roughly 3x faster than Claude Opus. Two API variants are available — M2.7 and M2.7-highspeed — with identical output quality but different latency profiles. The 204K context window supports long-form coding and document analysis tasks.

Pricing remains aggressive: $0.30 per million input tokens and $1.20 per million output tokens, with automatic caching bringing the blended cost down to $0.06 per million tokens. This makes M2.7 one of the most cost-effective frontier models available, costing a fraction of comparable models from Anthropic and OpenAI.

What This Means

M2.7 represents a notable shift in how AI models are developed. The self-evolution capability — where a model contributes meaningfully to its own training pipeline — has been a theoretical goal in AI research for years. MiniMax is the first major lab to publicly demonstrate and ship a model with this property.

However, M2.7 is proprietary, a departure from MiniMax’s earlier open-weights releases (M2 and M2.5). The model is available through the MiniMax Agent platform and API, and integrates with third-party coding tools including Claude Code, Cline, and Cursor.

For developers, the combination of frontier performance, aggressive pricing, and fast inference makes M2.7 a compelling option — especially for agent workflows and software engineering tasks where it competes directly with models costing 10–20x more.

Related Coverage

Sources