Anthropic Drops Flagship Safety Pledge Amid Competitive and Government Pressure

March 2, 2026Provided by Utku Ege Tuluk

Anthropic, the AI company that has long positioned itself as the industry’s most safety-conscious research lab, has dropped the central commitment of its Responsible Scaling Policy (RSP) — the pledge to halt model training if adequate safety measures could not be guaranteed in advance. The announcement, published on February 25, 2026, marks one of the most dramatic policy reversals in the AI industry to date.

A cracked translucent shield with neural network nodes glowing behind it, symbolizing fractured AI safety commitments — Illustration generated by AI

What Changed

Anthropic introduced its original RSP in 2023, promising to never train an AI system unless the company could guarantee beforehand that its safety measures were adequate. If a model’s capabilities outstripped the company’s ability to control them, training would pause — a hard limit that set Anthropic apart from competitors like OpenAI and Google DeepMind.

The revised RSP v3.0 eliminates this hard limit entirely. Under the new framework, Anthropic commits to delaying development only if two conditions are simultaneously met: the company believes it holds a significant lead over competitors, and it identifies catastrophic risks. If competitors are “blazing ahead,” as Chief Science Officer Jared Kaplan put it, Anthropic will no longer pause unilaterally.

“We didn’t really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead,” Kaplan told TIME. “We felt that it wouldn’t actually help anyone for us to stop training AI models.”

Why Now

Several converging pressures appear to have driven the change:

Competitive intensity: The AI race has accelerated sharply. Anthropic, now valued at $380 billion after a $30 billion funding round, faces pressure to keep pace with rivals shipping increasingly capable models.
Regulatory vacuum: Despite early hopes, comprehensive AI regulation has not materialized. The Trump Administration has taken a deregulatory stance on AI development, and global governance frameworks have stalled.
Pentagon pressure: The policy change arrived one day after Defense Secretary Pete Hegseth reportedly gave CEO Dario Amodei a Friday deadline to roll back AI safeguards or risk losing a $200 million Pentagon contract and potential placement on a government blacklist.
Evaluation complexity: Anthropic acknowledged that capability thresholds proved “far more ambiguous than anticipated,” making hard safety limits difficult to operationalize in practice.

The New Framework

RSP v3.0 introduces three structural changes in place of the original hard limits:

Separation of company vs. industry commitments — Anthropic now distinguishes between mitigations it will pursue alone and an “ambitious capabilities-to-mitigations map” it says requires industry-wide adoption, particularly at higher safety levels (ASL-4 and ASL-5).
Frontier Safety Roadmap — Instead of binding commitments, Anthropic will publish public goals it will “openly grade our progress towards,” including advanced red-teaming methods and information security R&D.
Risk Reports with external review — The company pledges to publish detailed safety assessments every 3–6 months, reviewed by third-party AI safety experts with minimal redaction.

Criticism

The reaction from the AI safety community has been swift. Chris Painter, director of policy at METR, an AI evaluation nonprofit, warned: “This is more evidence that society is not prepared for the potential catastrophic risks posed by AI.” He raised concerns about a “frog-boiling” dynamic, where dangers escalate so gradually that no single change triggers alarm.

Critics note the inherent tension in Anthropic’s argument: the company contends that pausing while less careful actors continue would make the world less safe, yet the same logic could justify any safety-conscious lab abandoning its commitments. The policy shift is particularly striking for a company that has described itself as the AI firm with a “soul.”

Related Coverage

Anthropic Exposes Industrial-Scale Distillation Attacks by DeepSeek, Moonshot, and MiniMax — Anthropic’s recent disclosure of model theft by Chinese AI labs
AI Safety Tests Under Scrutiny: In-Context Scheming and Agentic Misalignment — previous coverage of AI safety evaluation challenges
Meta’s Alignment Director Lost Control of OpenClaw — a recent example of AI safety failures in agentic systems
Anthropic Releases Claude Opus 4.6 with 1M Token Context Window — the latest Claude model release

What Changed

Why Now

The New Framework

Criticism

Related Coverage

Sources