MiniMax M1: The World’s First Open-Weight, Million-Token Context AI Model

June 19, 2025Provided by Utku Ege Tuluk

MiniMax, a Shanghai-based AI company founded in 2021, has released MiniMax M1, an open-weight reasoning model designed to handle extremely long inputs—up to one million tokens—while maintaining high inference efficiency. This marks a significant milestone in large-scale model design, positioning M1 as a leader in applications requiring deep, multi-step reasoning over vast contexts (arxiv.org, en.wikipedia.org).

Key Features

Hybrid Mixture-of-Experts Architecture
M1 employs a hybrid Mixture-of-Experts (MoE) design, activating 45.9 billion parameters per token out of a total of 456 billion. This allows the model to dynamically allocate capacity where it’s needed most, improving both performance and compute efficiency (huggingface.co).
Lightning Attention Mechanism
A custom “lightning attention” layer scales test-time compute far more efficiently than traditional attention. In benchmarks, M1 uses only 25 % of the FLOPs required by DeepSeek R1 at a generation length of 100,000 tokens, making it ideal for long-sequence tasks (huggingface.co).
1 Million-Token Native Context
Supporting eight times the context length of many current LLMs, M1 can process entire books, codebases, or multi-session dialogues without truncation or external memory tricks (venturebeat.com).

Reinforcement Learning & Training Efficiency

MiniMax-M1 was trained using a novel reinforcement learning scaling framework:

CISPO Algorithm: Clips importance sampling weights instead of token updates to stabilize and speed up RL fine-tuning, outperforming other leading RL methods.
Hybrid-Attention RL Synergy: The MoE + lightning attention combination not only boosts inference efficiency but also streamlines on-policy and off-policy training.

Using CISPO on 512 NVIDIA H800 GPUs, the team completed full RL training in just three weeks at a total rental cost of approximately $535,000—around 200× less expensive than training comparable proprietary models (arxiv.org).

Availability & Versions

MiniMax has open-sourced two variants under the Apache 2.0 license:

M1-40K: An intermediate checkpoint with 40,000 “thinking” tokens.
M1-80K: The fully trained model offering an 80,000-token thinking budget.

Both models, code, and documentation are available on GitHub: https://github.com/MiniMax-AI/MiniMax-M1 (arxiv.org).

Use Cases & Outlook

With its unprecedented context window and efficient compute profile, MiniMax M1 is well-suited for:

Large-scale code synthesis and review
Long-form document analysis (legal, medical, scientific)
Multi-turn conversational agents that maintain coherent context
Complex planning and reasoning tasks in software engineering environments

As open-source tooling continues to evolve, M1’s release is likely to spur innovation in applications that were previously infeasible due to context or compute constraints.