On February 19, 2026, Google DeepMind released Gemini 3.1 Pro — its most capable model to date and the first in the Gemini family to receive a “.1” mid-cycle update rather than the usual “.5” increment. The designation signals a focused intelligence upgrade: dramatic reasoning gains and stronger agentic performance, without a broad feature overhaul.
The headline improvement is reasoning. Gemini 3.1 Pro scores 77.1% on ARC-AGI-2, the benchmark designed to test a model’s ability to solve entirely novel logic patterns that cannot be memorized from training data. That figure is more than double the score achieved by Gemini 3 Pro — the largest single-generation reasoning jump seen in any frontier model family so far.
Other benchmark highlights include:
Google attributes these gains to integrating reasoning advances that originally debuted in the experimental Gemini 3 Deep Think mode, now baked into the standard model for all users.
Gemini 3.1 Pro is a natively multimodal model built on a Transformer-based Mixture-of-Experts architecture. It processes text, images, audio, video, and entire code repositories within a 1 million token context window, and can output up to 64,000 tokens in a single response — a significant leap that benefits long-form code generation, document synthesis, and extended agentic workflows.
A key developer-facing addition is the thinking_level parameter, which now includes a Medium option alongside the existing Low and High settings. This lets developers tune the trade-off between reasoning depth, output latency, and API cost within a single call, without switching between model variants.
Pricing remains unchanged from Gemini 3 Pro at $2 / $12 per million input/output tokens — considerably more affordable than comparable frontier models at similar performance tiers.
Gemini 3.1 Pro was built with agentic use cases as a primary target. On multi-step autonomous tasks (APEX-Agents), tool coordination (MCP Atlas: 69.2%), and autonomous web research (BrowseComp: 85.9%), the model shows substantial gains over its predecessor. These improvements make it well-suited for AI agent pipelines, complex coding assistants, and long-horizon research workflows.
The model is available in preview via the Gemini API, Google AI Studio, Vertex AI, Gemini Enterprise, Gemini CLI, and Android Studio. It also powers NotebookLM for Google AI Pro and Ultra subscribers.
Gemini 3.1 Pro’s ARC-AGI-2 score is significant not just as a number — it represents meaningful progress on a benchmark specifically designed to resist training-data pattern matching. Combined with its doubled agentic performance, the release signals that frontier models are becoming increasingly capable of handling open-ended, multi-step tasks with minimal human intervention.
For researchers and developers at NYU Shanghai, this release also raises the bar for what “capable” means in practical AI tooling. The 1M token context window makes it feasible to feed entire codebases or research document collections into a single query. The new thinking level controls give developers finer-grained cost-performance management — useful when building production-scale AI applications on limited budgets.
Google’s decision to hold the price steady while doubling reasoning performance also intensifies competition across the frontier model landscape, where pricing pressure has become as significant as raw capability.
