Mistral Medium 3.5 Launches with Vibe Remote Coding Agents

On April 29, 2026, Mistral AI launched Mistral Medium 3.5 — a 128-billion-parameter dense multimodal model with a 256k context window — alongside Vibe remote agents, a cloud-based system that runs coding sessions asynchronously and in parallel. The model unifies instruction following, reasoning, and coding into a single set of weights, replaces Devstral 2 in Mistral’s Vibe coding agent, and ships under a modified MIT license with open weights on Hugging Face.

Intermediate

Mistral Medium 3.5 announcement banner with Vibe remote agents
Image credit: Mistral AI

What’s New in Medium 3.5

Medium 3.5 is Mistral’s first flagship “merged” model — a single dense 128B-parameter network that subsumes the company’s previous trio of Mistral Medium 3.1 (general chat), Magistral (reasoning in Le Chat), and Devstral 2 (coding). The model supports configurable reasoning effort per request, native function calling, JSON output, and 24 languages. Its vision encoder was trained from scratch to handle variable image sizes and aspect ratios, and the 256k context window is large enough to load substantial codebases or long documents in one shot.

On agentic benchmarks, Mistral reports 77.6% on SWE-Bench Verified — ahead of Devstral 2 and Qwen3.5 397B — and 91.4% on the τ³-Telecom benchmark. Pricing on the Mistral API is $1.5 per million input tokens and $7.5 per million output tokens, and the model can be self-hosted on as few as four GPUs.

Bar chart comparing Mistral Medium 3.5 against other models on agentic benchmarks including SWE-Bench Verified and τ³-Telecom
Image credit: Mistral AI / Hugging Face

Vibe Remote Agents: Coding That Runs Without You

The headline product launching with Medium 3.5 is Vibe remote agents — cloud-hosted coding sessions that run in isolated sandboxes and can be spawned in parallel from either the Mistral Vibe CLI or Le Chat. Instead of watching a local terminal, developers monitor progress through file diffs, tool calls, progress states, and questions the agent surfaces along the way. A local Vibe session can also be “teleported” to the cloud with its history preserved, freeing the developer’s machine to work on something else.

Mistral pitches this for the kinds of tasks that are tedious to babysit: module refactors, test generation, dependency upgrades, CI investigations, and bug fixes. Sessions can open pull requests on GitHub directly, and the agents integrate with Linear, Jira, Sentry, Slack, and Teams. Notifications fire when work completes so the developer reviews results rather than keystrokes.

Diagram showing how Mistral Vibe remote agents integrate with GitHub, Linear, Jira, Sentry, and chat platforms
Image credit: Mistral AI

Work Mode and Enterprise Reach

Le Chat also gains a new Work mode in preview — an agentic mode for cross-tool workflows like synthesizing email, calendar, and message context, generating research reports, and triaging an inbox with draft replies and Jira tickets. Connectors are enabled by default, but sensitive actions still require explicit approval. On the infrastructure side, Mistral announced that Medium 3.5 is available as NVIDIA NIM microservices and through GPU endpoints on build.nvidia.com, alongside open weights on Hugging Face for self-hosting.

Benchmark chart comparing Mistral Medium 3.5 across instruction following, reasoning, and coding tasks
Image credit: Mistral AI / Hugging Face

What This Means

Two things stand out. First, the model consolidation — collapsing chat, reasoning, and coding into one set of weights — is a bet that configurable test-time compute (a “reasoning effort” knob) is now a better lever than maintaining separate specialist models. That simplifies deployment for teams who previously rotated between Mistral’s chat, reasoning, and Devstral checkpoints. Second, Vibe remote agents push the same async-coding pattern that Anthropic’s Claude Managed Agents introduced earlier this month into Mistral’s open-weights stack: build the orchestration once, then let dozens of sandboxed sessions chew through unglamorous engineering work in parallel. For self-hosters, the four-GPU footprint and modified MIT license mean this is one of the few frontier-class agentic stacks that is realistically deployable on-prem.

Related Coverage

Sources