MiniMax M2.7 Ships as Open Weights: Frontier Agentic Model on Hugging Face

MiniMax M2.7 is now downloadable. The 230B-parameter self-evolving model that MiniMax unveiled in March has shipped as open weights on Hugging Face and ModelScope under a Modified-MIT license, and community quantizations already cover everything from a 60 GB 1-bit build to the full 457 GB BF16 release — putting a frontier-class agentic model in reach of anyone with enough VRAM or RAM to host it.

Intermediate

MiniMax M2.7 release banner
Image credit: MiniMax

What’s in the Release

The official MiniMaxAI/MiniMax-M2.7 repository hosts the 229B-parameter sparse mixture-of-experts model in F32, BF16, and F8_E4M3 tensor formats. Only 10B parameters activate per token (8 of 256 local experts across 62 layers), and the context window is 200K tokens. The license is a modified MIT — permissive enough for research and most commercial use, with the usual attribution and trademark carve-outs.

Recommended inference runtimes are SGLang and vLLM, with Transformers and standard Hugging Face tooling also supported. MiniMax publishes reference sampling parameters — temperature=1.0, top_p=0.95, top_k=40 — which is worth noting because the default temperature=0 configurations used in most agent harnesses will underperform the reported benchmarks.

Quantizations: From 60 GB to 457 GB

Within days of release, unsloth’s GGUF conversions appeared with 22 quantization variants. The practical landing zones:

  • UD-IQ1_M (60.7 GB) — fits in a single 80 GB H100 or a 64 GB unified-memory Mac, at the cost of measurable quality loss.
  • UD-Q2_K_XL (75.3 GB) — the sweet spot for 96 GB workstations and dual-GPU rigs.
  • UD-Q4_K_M (140 GB) — the community’s usual “near-lossless” target; needs 2× 80 GB cards or a 192 GB Mac Studio.
  • Q8_0 (243 GB) — full 8-bit fidelity for serious evaluation work.
  • BF16 (457 GB) — the reference weights, for fine-tuning and distillation.

Because the MoE architecture activates only 10B parameters per token, inference throughput scales closer to a 10B dense model than to a 230B one — one of the reasons a 1-bit quant on consumer-adjacent hardware is even a reasonable conversation.

Benchmarks in the Model Card

The Hugging Face model card publishes a head-to-head table that puts M2.7 firmly in the frontier tier among open-weight models. Highlights:

  • SWE-Pro: 56.22% — matching GPT-5.3-Codex
  • SWE Multilingual: 76.5%
  • Terminal Bench 2: 57.0%
  • MLE Bench Lite: 66.6% medal rate — second only to Claude Opus 4.6 at 75.7%
  • GDPval-AA: 1495 ELO — the highest score among open-weight models
  • MM Claw: 62.7% end-to-end, with 97% skill-adherence across 40+ complex skills

These numbers carry the usual caveats for lab-reported benchmarks, but the open release means third parties can now reproduce them rather than taking MiniMax’s word for it.

Why Open Weights Matter Here

M2.7’s headline story has always been its “self-evolution” training loop — a model that autonomously modified its own scaffold code across 100+ rounds to improve on its own evaluations. That kind of claim is easy to dismiss as marketing when the weights stay behind an API. Shipping the actual model opens the door for independent researchers to audit the agent harness behavior, probe for the self-modification patterns, and run the benchmarks under controlled conditions.

For students and labs at NYU Shanghai, the practical upside is the same as it was for DeepSeek-V3 and Qwen3: a permissively licensed, frontier-competitive checkpoint that can be fine-tuned, dissected, or dropped into a local agent pipeline without negotiating API terms. The barrier is no longer access — it’s VRAM.

Related Coverage

Sources