Leanstral: Mistral’s Open-Source Proof Agent for Lean 4

On March 16, 2026, Mistral AI released Leanstral — the first open-source AI agent purpose-built for Lean 4, the formal proof assistant used across mathematical research and verified software development. With 120 billion total parameters but only 6 billion active per token, Leanstral can formally prove that AI-generated code meets its specifications — addressing one of the biggest bottlenecks in trustworthy AI coding. The model is released under the Apache 2.0 license.
Advanced
Why Formal Verification Matters for AI Coding
As AI coding agents become more capable, a critical bottleneck has emerged: human review. Developers still need to manually verify that machine-generated code is correct, especially in high-stakes domains like cryptography, financial systems, and safety-critical software. Formal proof assistants like Lean 4 can mathematically guarantee that code meets its specifications — but until now, no AI model was specifically trained for this workflow.
Leanstral changes that equation. Rather than wrapping a generalist LLM around Lean’s syntax, Mistral trained a specialized model to operate natively in realistic formal repositories, understanding Lean’s type system, tactic language, and proof obligations from the ground up.
Architecture and Performance
Leanstral uses a highly sparse Mixture-of-Experts architecture — 120B total parameters with only 6B active per token. This makes it dramatically more cost-efficient than competing approaches.
On FLTEval, a benchmark designed for realistic proof engineering scenarios, Leanstral outperforms much larger open-source models:
- Leanstral (120B-A6B): 26.3 at pass@2, 29.3 at pass@4, 31.9 at pass@16
- Qwen3.5-397B: 25.4 at pass@4 — requiring twice the compute budget to fall short
- GLM5-744B: 16.6 — a model 6x larger but far less capable at proofs
- Kimi-K2.5: 20.1
Against proprietary models, Leanstral offers a compelling cost-performance tradeoff. At pass@2, it achieves a score of 26.3 for approximately $36, compared to Claude Sonnet’s equivalent at $549 — roughly 15x cheaper. Claude Opus 4.5 remains the top performer at 39.6, but at $1,650 per run (92x the cost of Leanstral at pass@16).
Native Lean 4 Integration via MCP
A key differentiator is Leanstral’s integration with Lean’s Language Server Protocol through Model Context Protocol (MCP). The model was specifically trained to achieve maximal performance with lean-lsp-mcp, meaning it can:
- Query the Lean compiler for type information and error diagnostics in real time
- Navigate and understand existing formal repositories
- Diagnose proof failures and suggest fixes based on actual compiler feedback
This is fundamentally different from models that treat Lean code as plain text — Leanstral operates with live awareness of the proof state.
Real-World Demonstrations
Mistral showcased two concrete use cases. In the first, Leanstral diagnosed and fixed a real Stack Exchange question about breaking changes in Lean 4.29.0-rc6, correctly identifying a definitional equality issue and proposing the appropriate abbrev solution. In the second, the model translated Rocq (formerly Coq) definitions into Lean and proved properties about imperative programs, including implementing custom notation and generating theorem proofs from specification statements alone.
Availability
Leanstral is available through three deployment paths:
- Mistral Vibe: integrated via the
/leanstralcommand with zero setup - Free API endpoint:
labs-leanstral-2603for limited-time community feedback - Self-hosted: Apache 2.0 weights on Hugging Face
Mistral also released the FLTEval evaluation suite and a technical report on the training methodology alongside the model.



沪公网安备31011502017015号