MiniMax, a Shanghai-based AI company founded in 2021, has released MiniMax M1, an open-weight reasoning model designed to handle extremely long inputs—up to one million tokens—while maintaining high inference efficiency. This marks a significant milestone in large-scale model design, positioning M1 as a leader in applications requiring deep, multi-step reasoning over vast contexts (arxiv.org, en.wikipedia.org).
Key Features
- Hybrid Mixture-of-Experts Architecture
M1 employs a hybrid Mixture-of-Experts (MoE) design, activating 45.9 billion parameters per token out of a total of 456 billion. This allows the model to dynamically allocate capacity where it’s needed most, improving both performance and compute efficiency (huggingface.co).
- Lightning Attention Mechanism
A custom “lightning attention” layer scales test-time compute far more efficiently than traditional attention. In benchmarks, M1 uses only 25 % of the FLOPs required by DeepSeek R1 at a generation length of 100,000 tokens, making it ideal for long-sequence tasks (huggingface.co).
- 1 Million-Token Native Context
Supporting eight times the context length of many current LLMs, M1 can process entire books, codebases, or multi-session dialogues without truncation or external memory tricks (venturebeat.com).
Reinforcement Learning & Training Efficiency
MiniMax-M1 was trained using a novel reinforcement learning scaling framework:
- CISPO Algorithm: Clips importance sampling weights instead of token updates to stabilize and speed up RL fine-tuning, outperforming other leading RL methods.
- Hybrid-Attention RL Synergy: The MoE + lightning attention combination not only boosts inference efficiency but also streamlines on-policy and off-policy training.
Using CISPO on 512 NVIDIA H800 GPUs, the team completed full RL training in just three weeks at a total rental cost of approximately $535,000—around 200× less expensive than training comparable proprietary models (arxiv.org).
Availability & Versions
MiniMax has open-sourced two variants under the Apache 2.0 license:
- M1-40K: An intermediate checkpoint with 40,000 “thinking” tokens.
- M1-80K: The fully trained model offering an 80,000-token thinking budget.
Both models, code, and documentation are available on GitHub: https://github.com/MiniMax-AI/MiniMax-M1 (arxiv.org).
Use Cases & Outlook
With its unprecedented context window and efficient compute profile, MiniMax M1 is well-suited for:
- Large-scale code synthesis and review
- Long-form document analysis (legal, medical, scientific)
- Multi-turn conversational agents that maintain coherent context
- Complex planning and reasoning tasks in software engineering environments
As open-source tooling continues to evolve, M1’s release is likely to spur innovation in applications that were previously infeasible due to context or compute constraints.