Anthropic released Claude Opus 4.6 on February 5, 2026, upgrading its flagship model with a 1 million token context window, significantly stronger agentic coding performance, and a new adaptive reasoning system — all at unchanged pricing. The release marks Anthropic’s fastest iteration yet on the Opus line, arriving just three months after Opus 4.5.
The headline upgrade is the 1 million token context window, now available in beta. One million tokens is enough to hold an entire enterprise codebase of thousands of files, 750 novels, or a full legal discovery set — all in a single prompt. To put this in perspective, Claude Opus 4.5’s previous context window was 200k tokens; Opus 4.6 expands that fivefold.
Equally important is how the model handles those long contexts. On MRCR v2, a needle-in-a-haystack retrieval test that buries specific facts inside massive prompts, Opus 4.6 scores 76% — compared to just 18.5% for Claude Sonnet 4.5. The model doesn’t just hold more information; it can reliably find and reason over what matters within it.
Anthropic also introduced context compaction, an automatic summarization mechanism that condenses older context when an agent approaches its limit. This allows long-running agentic tasks to continue indefinitely without hitting context walls — a persistent pain point for developers building multi-step AI pipelines.
On the reasoning side, Opus 4.6 features adaptive thinking: the model detects how much extended reasoning each task actually requires and applies effort accordingly. Developers can also override this with four explicit effort levels (low, medium, high, max) for more precise control over speed and cost.
Opus 4.6 posts notable improvements across the major AI benchmarks:
These numbers position Opus 4.6 as the strongest publicly available model across reasoning, coding, and long-context tasks as of its release date.
Much of Opus 4.6’s engineering investment targets agentic use cases — scenarios where the model autonomously executes multi-step tasks over extended periods. The model plans more carefully, sustains tasks longer, and operates more reliably inside large codebases. It has also improved code review and self-debugging, catching its own mistakes before they propagate.
A standout new feature is agent teams in Claude Code, Anthropic’s terminal-based coding assistant. Developers can now spin up parallel subagents that work simultaneously on different parts of a codebase, then reconcile their outputs. For large refactors or multi-file feature implementations, this dramatically reduces end-to-end turnaround.
Additional developer-facing additions include 128k output tokens (enabling generation of very long documents and codebases in a single call), a research preview of Claude in PowerPoint, and expanded availability across AWS Bedrock, Google Cloud Vertex AI, and Microsoft Azure AI Foundry.
Pricing is unchanged: $5 per million input tokens / $25 per million output tokens. For prompts exceeding 200k tokens (the extended context range), a premium applies at $10/$37.50. Given the capabilities added, this keeps Opus 4.6 competitively priced against GPT-5.2.
On safety, Anthropic reports that Opus 4.6 maintains equivalent or superior safety standards to competing frontier models. Evaluations included interpretability-based methods and cybersecurity probes, with low misaligned behavior rates and minimal over-refusal issues.
Early enterprise partners have been positive. Notion’s AI Lead described Opus 4.6 as feeling “less like a tool and more like a capable collaborator.” Cognition’s CEO highlighted its ability to catch edge cases and bugs that previous models missed.
Claude Opus 4.6 represents a meaningful step-change in what’s practical for AI-assisted software development and knowledge work. The 1M token window, combined with context compaction and adaptive reasoning, removes some of the most frustrating limits of current agentic pipelines. For researchers managing large document sets, engineers working in sprawling codebases, or legal teams processing discovery, the practical headroom is substantial.
The ARC-AGI-2 result is especially notable: a benchmark explicitly designed to measure novel problem-solving (not memorization) saw a 31-point jump from one Opus version to the next. That’s the kind of gain that tends to signal genuine capability improvement rather than benchmark gaming.
Developers can access Opus 4.6 today through the Anthropic API, claude.ai, and all major cloud platforms.
