Anthropic Exposes Industrial-Scale Distillation Attacks by DeepSeek, Moonshot, and MiniMax

Anthropic has revealed that three Chinese AI laboratories — DeepSeek, Moonshot AI, and MiniMax — conducted industrial-scale “distillation attacks” on its Claude models, generating over 16 million exchanges through approximately 24,000 fraudulent accounts to extract Claude’s capabilities and train their own models. The disclosure, published February 23, 2026, has reignited debate over AI intellectual property, national security, and the competitive dynamics between US and Chinese AI companies.

Visualization of data streams being siphoned from a central neural network sphere by shadowy peripheral entities
Illustration generated by AI

What Is a Distillation Attack?

Model distillation is a well-established machine learning technique in which a smaller “student” model is trained on the outputs of a larger, more capable “teacher” model. When done legitimately — for instance, a company distilling its own proprietary model for efficiency — it is a standard optimization practice. A distillation attack weaponizes this process: attackers send massive volumes of carefully crafted prompts to a commercial API, collect the responses, and use them as training data to replicate the target model’s capabilities — without authorization and in violation of terms of service.

The appeal is obvious: frontier model development requires billions of dollars in compute and years of research. Distillation can shortcut that investment, allowing a competitor to approximate the performance of a leading model at a fraction of the cost.

Scale and Tactics of the Three Campaigns

Anthropic identified three distinct campaigns, each with a different target capability profile:

  • DeepSeek: Over 150,000 exchanges focused on foundational logic and alignment — specifically probing censorship-safe responses and policy-sensitive queries, suggesting an interest in reproducing Claude’s safety-tuning behaviors.
  • Moonshot AI: Over 3.4 million exchanges targeting agentic reasoning and tool use, coding and data analysis, computer-use agent development, and computer vision.
  • MiniMax: Over 13 million exchanges concentrated on agentic coding and tool use capabilities — by far the largest volume of the three campaigns.

All three campaigns shared a similar playbook: fraudulent accounts created at scale, shared payment methods, coordinated timing described by Anthropic as “load balancing,” and highly repetitive prompt structures targeting specific capability domains. One proxy network alone managed over 20,000 fraudulent accounts simultaneously. These patterns stood out clearly against normal usage, with volume and structure inconsistent with legitimate research or commercial use.

Detection: Classifiers and Behavioral Fingerprinting

Anthropic says it identified the campaigns through a combination of technical approaches:

  • Traffic classifiers that flag API usage patterns inconsistent with normal human interaction — high-repetition prompts, narrow capability targeting, and abnormal request volumes.
  • Behavioral fingerprinting that identifies coordinated account networks based on shared payment methods, IP ranges, and request timing.
  • Access controls that have since been strengthened for educational and research accounts, which are common vectors for abuse.
  • Product and model-level countermeasures designed to reduce the efficacy of distillation attempts even when extraction is attempted.

Anthropic also shared technical indicators with other AI laboratories and relevant authorities, acknowledging that this is an industry-wide problem that cannot be solved by any single company.

National Security Implications

Beyond competitive harm, Anthropic raises a more pointed concern: distilled models stripped of safety measures. Models distilled illicitly from Claude lack the safety training that Anthropic builds in — meaning the extracted capabilities could be redeployed without guardrails against bioweapon development, malicious cyber operations, or mass surveillance by authoritarian governments. Anthropic frames this not just as a terms-of-service violation, but as a national security issue that warrants coordinated action from industry, cloud providers, and policymakers.

The timing is notable: the disclosure came as the US was actively debating AI chip export controls. TechCrunch reported that Anthropic’s accusations landed squarely in the middle of that policy conversation, with implications for how the US government regulates access to both frontier AI APIs and the compute infrastructure that enables competitive AI development.

Google also disclosed in February 2026 that attackers attempted to clone its Gemini model using over 100,000 prompts — a smaller-scale incident, but confirming that distillation attacks are not an isolated phenomenon targeting just one company.

What This Means for the AI Industry

Anthropic’s disclosure marks a shift in how frontier AI companies are thinking about API access. Open, permissive API access has been a cornerstone of developer ecosystems — but it also creates a surface area for capability extraction at scale. The response will likely involve harder verification requirements, more sophisticated behavioral monitoring, and possibly API usage tiers with stronger identity requirements for high-volume access.

For the broader AI community, this raises a harder question: if distillation attacks can replicate the capabilities of a frontier model at a fraction of the cost, does that undermine the competitive moat that justifies massive R&D investments? And if safety training can be stripped out in the distillation process, what does that mean for the assumption that safety-conscious frontier labs set the tone for the field?

Related Coverage

Sources