Chinese AI Labs Stole 16M Claude Responses: Anthropic Says

A San Francisco-based artificial intelligence company has accused three Chinese AI labs of stealing more than 16 million responses from its flagship model.

Anthropic released a detailed report on February 23 naming DeepSeek, Moonshot AI and MiniMax. The company said these firms created over 24,000 fake accounts to extract data from Claude, its popular AI assistant.

The technique is called model distillation. Companies query a powerful model like Claude and use the outputs to train their own systems. This approach lets developers skip expensive and time-consuming work.

But Anthropic said this went far beyond normal research use.

“This was industrial-scale extraction,” the company wrote in its report. “The volume, the methods and the clear violation of our terms point to organized campaigns.”

How Model Distillation Works

Model distillation has become a common technique in artificial intelligence development.

The process involves sending prompts to a larger, more capable model and collecting the responses. These responses then serve as training data for a smaller model. The smaller model learns to mimic the larger one’s outputs.

For companies with limited computing resources, distillation offers a path to better performance without massive infrastructure investment.

But there is a catch. Most AI companies prohibit large-scale extraction in their terms of service. They see distillation as theft when done without permission and at commercial scale.

Details of the Extraction Campaign

Methods Used to Hide Activity

The three Chinese companies used different methods to hide what they were doing.

They ran traffic through proxy services to mask their location. They set up what Anthropic calls “hydra clusters”—systems that spread queries across many accounts to avoid triggering detection.

Some queries looked normal. Others were carefully designed to pull high-quality responses on specific topics.

The companies focused on Claude’s strongest capabilities. They wanted help with complex reasoning, coding, tool use and vision tasks. Some prompts asked for answers on sensitive topics that would normally get blocked in China.

How Anthropic Detected the Theft

Anthropic spotted the pattern through IP addresses, request timing and links between accounts.

The operation grew more sophisticated over time. Early attempts were easier to catch. Later ones used better proxy services and more carefully crafted prompts.

Behavioral fingerprinting played a key role. The company analyzed patterns in how queries were submitted and how accounts behaved. This approach helped distinguish normal usage from organized extraction.

Breaking Down the Numbers by Company

MiniMax: The Largest Operation

MiniMax led in volume with more than 13 million exchanges.

The company focused heavily on coding and tool orchestration. When Anthropic released a new Claude version, MiniMax shifted its targeting within days. This quick adaptation suggested an organized effort with dedicated resources.

Tool orchestration refers to a model’s ability to coordinate multiple functions and APIs. It represents an advanced capability that companies want for building AI agents.

Moonshot AI: Senior Staff Involvement

Moonshot AI generated over 3.4 million interactions.

Some accounts traced back to senior staff at the company. This detail matters because it suggests the extraction had official approval rather than being rogue activity by individual employees.

The queries targeted agentic reasoning, coding and computer vision. Agentic reasoning means a model can plan and execute multi-step tasks toward a goal. It is one of the most sought-after capabilities in current AI development.

DeepSeek: Focus on Reasoning

DeepSeek ran more than 150,000 exchanges.

It focused on reasoning tasks and reinforcement learning setups. Some prompts asked for responses scrubbed of politically sensitive content—essentially asking Claude to censor itself.

This last point raises interesting questions. Chinese companies face strict content rules at home. Extracting censored responses from US models could help them build systems that satisfy both performance goals and regulatory requirements.

Connection to US Export Controls

The Logic Behind Chip Restrictions

The accusations come at a tense moment in US-China technology competition.

The United States has strict rules on exporting advanced chips to China. The goal is to slow Chinese AI development by cutting off access to the hardware needed to train cutting-edge models. These export restrictions target companies like Nvidia that make high-performance AI chips.

The theory holds that without advanced chips, Chinese labs cannot scale their models to match US capabilities.

How Distillation Undermines Export Rules

Distillation offers a workaround. Instead of building capability from scratch, Chinese labs can pull it from US models.

This approach reduces their need for massive computing clusters. A company can train a reasonably good model using distillation even with limited hardware. The resulting model inherits capabilities developed by the target model.

On the same day Anthropic released its report, a senior US official told Reuters that DeepSeek trained its upcoming model on Nvidia Blackwell GPUs. Those chips face a clear export ban to China.

The official said the training cluster sat in Inner Mongolia. DeepSeek likely used distillation from US models including Anthropic, OpenAI, Google and xAI—to boost performance. The source noted DeepSeek might try to hide evidence of banned chip use.

Nvidia has denied similar claims in the past and asked for evidence. The company faces ongoing scrutiny over whether its chips reach Chinese firms through third parties.

Previous Accusations and Industry Context

OpenAI’s Similar Claims

OpenAI made similar claims against DeepSeek last month. The company said it saw evidence of large-scale extraction campaigns targeting its models.

These accusations fit a broader pattern. As US companies build more powerful AI, they become targets for competitors who want to catch up quickly. Distillation offers a shortcut that saves time, money and computing resources.

The Irony Point

Some in the AI community pointed to irony in the accusations.

US labs trained their own models on vast amounts of public internet data, often without clear permission. That practice has led to copyright lawsuits from authors, artists and news organizations. Settlements in some cases have run into millions of dollars.

The difference, according to Anthropic, lies in the terms of service. Companies that offer public AI access set rules about how their models can be used. Large-scale extraction for commercial purposes violates those rules.

Why This Matters for AI Safety

Anthropic warned that distilled models often lose safety features.

The companies building them strip out refusal mechanisms and alignment work. This could lead to models that are easier to misuse for military, intelligence or surveillance purposes.

A model that refuses to answer dangerous questions might lose that protection during distillation. The company doing the extraction may not replicate safety features when training its own system. The result could be powerful AI with fewer guardrails.

Who Are the Accused Companies

DeepSeek

DeepSeek gained attention in AI circles for strong technical work despite hardware restrictions. The company has built models that compete with US offerings while using fewer resources. Its upcoming release has drawn significant interest from researchers and competitors alike.

The company operates out of China and focuses on foundation model development. Prior to these accusations, it had built a reputation for efficient training methods.

Moonshot AI

Moonshot AI focuses on long-context processing. The company’s models can handle large amounts of text in a single session. This capability has applications in research, document analysis and customer service.

The alleged involvement of senior staff suggests extraction is part of the company’s formal strategy rather than unauthorized activity by lower-level employees.

MiniMax

MiniMax builds both text and multimodal models. The company works on systems that process images, video and audio alongside text. Its volume of extraction—more than 13 million exchanges—suggests aggressive development goals.

None of the three companies had responded publicly as of February 24. DeepSeek’s next model release is expected soon, which may bring more details to light.

What This Means for AI Competition

Cracks in Current Controls

These accusations highlight weaknesses in existing restrictions.

Export controls aim to keep computing advantages with US firms. They assume that hardware access determines who leads in AI. Distillation challenges that assumption by showing how capability can transfer through software and data.

A company with modest hardware can still build strong models if it can extract knowledge from leading systems. The gap between having the best chips and having good enough chips narrows.

Calls for Stronger Action

Anthropic called for industry coordination and policy updates. The company stressed the narrow window to act before such methods widen gaps between US and Chinese capabilities.

Possible responses include better detection technology, legal action against violators and diplomatic pressure. None of these options offer a complete solution on their own.

The Broader US-China Dynamic

The episode fits larger US-China friction over AI dominance.

Chinese firms push forward despite restrictions, sometimes through alleged smuggling or proxies for chips. US firms push for stronger enforcement while continuing to improve their models. Neither side shows signs of backing down.

Observers note the debate cuts both ways. Training data ethics remain contested globally, with no universal rules yet. US companies benefit from data drawn from around the world while trying to restrict how their outputs get used.

Security Concerns and Safeguards

Loss of Built-in Protections

Distilled models could lose built-in refusals or alignment features.

This opens doors to harmful applications if deployed widely. A model designed for general use might include safeguards against dangerous queries. A model trained through distillation might skip those safeguards entirely.

The risk extends beyond individual companies. If distilled models spread widely, they could become the default AI tools in regions with less oversight. Those tools might lack the safety features built into original models.

Detection and Prevention

Anthropic stressed behavioral fingerprinting helped catch these campaigns. The company analyzed patterns in how queries were submitted and how accounts behaved.

This approach could help other labs spot similar extraction attempts. Sharing detection methods across the industry could make it harder for bad actors to hide.

The report serves as both disclosure and warning. Anthropic wants collective action on model security and enforcement of access rules. Without coordination, extraction campaigns may become harder to detect and stop.

Immediate Developments

The story continues to develop. More details may emerge when DeepSeek releases its next model. Researchers will likely compare its capabilities to previous versions and look for signs of distillation from US models.

Other US labs may share their own findings about extraction attempts. OpenAI already made similar claims last month. Google and others could follow with their own observations.

Policy Implications

Policymakers will likely face pressure to update rules that distillation exploits.

Current export controls focus on hardware. Future rules might target the software and data side of AI development. Options include restricting API access, pursuing legal remedies and working with allies on coordinated responses.

The Long View

For now, the core dynamic remains simple. US companies built powerful AI through massive investment in computing, data and talent. Chinese companies want that power too. And they have found a method to take it without permission.

Whether that method ultimately succeeds depends on enforcement, technology and policy. Detection tools will improve. Extraction techniques will also improve. The cat-and-mouse game shows no signs of ending.

Anthropic’s detailed report provides a window into how this game currently looks. The numbers are large. The methods are sophisticated. And the stakes involve which country leads in a technology that could reshape how people work, learn and solve problems.

Summary

Anthropic has accused three Chinese AI labs DeepSeek, Moonshot AI and MiniMax of stealing over 16 million responses from its Claude model through more than 24,000 fake accounts.

The companies used proxy services and distributed systems to hide their activity. They targeted Claude’s strongest capabilities including reasoning, coding and tool use.

MiniMax led with 13 million exchanges, followed by Moonshot AI at 3.4 million and DeepSeek at 150,000. Some accounts tied back to senior staff at the accused companies.

The alleged theft undermines US export controls that aim to slow Chinese AI development through hardware restrictions. Distillation offers a workaround by transferring capability through software rather than chips.

Previous accusations from OpenAI suggest this is an industry-wide issue. Anthropic warns that distilled models may lose safety features, creating risks for misuse.

None of the accused companies have responded publicly. The story continues to develop as DeepSeek prepares to release its next model and policymakers consider responses to extraction campaigns.