Leanstral: Mistral’s Open-Source Proof Agent for Lean 4

On March 16, 2026, Mistral AI released Leanstral — the first open-source AI agent purpose-built for Lean 4, the formal proof assistant used across mathematical research and verified software development. With 120 billion total parameters but only 6 billion active per token, Leanstral can formally prove that AI-generated code meets its specifications — addressing one of the biggest bottlenecks in trustworthy AI coding. The model is released under the Apache 2.0 license.

Advanced

A formal proof tree rendered as a luminous 3D directed acyclic graph with interconnected proof nodes
Illustration generated by AI

Why Formal Verification Matters for AI Coding

As AI coding agents become more capable, a critical bottleneck has emerged: human review. Developers still need to manually verify that machine-generated code is correct, especially in high-stakes domains like cryptography, financial systems, and safety-critical software. Formal proof assistants like Lean 4 can mathematically guarantee that code meets its specifications — but until now, no AI model was specifically trained for this workflow.

Leanstral changes that equation. Rather than wrapping a generalist LLM around Lean’s syntax, Mistral trained a specialized model to operate natively in realistic formal repositories, understanding Lean’s type system, tactic language, and proof obligations from the ground up.

Architecture and Performance

Leanstral uses a highly sparse Mixture-of-Experts architecture — 120B total parameters with only 6B active per token. This makes it dramatically more cost-efficient than competing approaches.

On FLTEval, a benchmark designed for realistic proof engineering scenarios, Leanstral outperforms much larger open-source models:

  • Leanstral (120B-A6B): 26.3 at pass@2, 29.3 at pass@4, 31.9 at pass@16
  • Qwen3.5-397B: 25.4 at pass@4 — requiring twice the compute budget to fall short
  • GLM5-744B: 16.6 — a model 6x larger but far less capable at proofs
  • Kimi-K2.5: 20.1

Against proprietary models, Leanstral offers a compelling cost-performance tradeoff. At pass@2, it achieves a score of 26.3 for approximately $36, compared to Claude Sonnet’s equivalent at $549 — roughly 15x cheaper. Claude Opus 4.5 remains the top performer at 39.6, but at $1,650 per run (92x the cost of Leanstral at pass@16).

Native Lean 4 Integration via MCP

A key differentiator is Leanstral’s integration with Lean’s Language Server Protocol through Model Context Protocol (MCP). The model was specifically trained to achieve maximal performance with lean-lsp-mcp, meaning it can:

  • Query the Lean compiler for type information and error diagnostics in real time
  • Navigate and understand existing formal repositories
  • Diagnose proof failures and suggest fixes based on actual compiler feedback

This is fundamentally different from models that treat Lean code as plain text — Leanstral operates with live awareness of the proof state.

Real-World Demonstrations

Mistral showcased two concrete use cases. In the first, Leanstral diagnosed and fixed a real Stack Exchange question about breaking changes in Lean 4.29.0-rc6, correctly identifying a definitional equality issue and proposing the appropriate abbrev solution. In the second, the model translated Rocq (formerly Coq) definitions into Lean and proved properties about imperative programs, including implementing custom notation and generating theorem proofs from specification statements alone.

Availability

Leanstral is available through three deployment paths:

  • Mistral Vibe: integrated via the /leanstral command with zero setup
  • Free API endpoint: labs-leanstral-2603 for limited-time community feedback
  • Self-hosted: Apache 2.0 weights on Hugging Face

Mistral also released the FLTEval evaluation suite and a technical report on the training methodology alongside the model.

Sources