Anthropic Built a Model Too Dangerous to Release

5 min read
Article

Claude Mythos, Anthropic's secret model, identified thousands of zero-days in weeks. Too dangerous to release. What it reveals about what comes next.

The free AI newsletter
Anthropic Built a Model Too Dangerous to Release

For years, Anthropic has hammered home one message: we develop AI responsibly. We are the serious company. In April 2026, their own results call that narrative into question. Not because they failed, but because they succeeded.


What Claude Mythos Did in a Few Weeks

The facts first.

In March 2026, Anthropic was preparing a blog post about a new model. The draft leaked accidentally through an unsecured S3 bucket. Maximum irony for a company whose central pitch is operational rigor.

What was revealed: Claude Mythos is "far ahead of any AI model in cyber capabilities." Over a few weeks of testing, the model identified thousands of critical zero-day vulnerabilities across every major operating system and popular browser. Among the documented examples:

  • A 27-year-old bug in OpenBSD, undetected despite the OS's security reputation
  • A flaw in FFmpeg that survived 5 million automated tests
  • Multiple Linux kernel vulnerabilities enabling full machine compromise

The distinctive capability is not finding isolated bugs. It is chaining multiple distinct vulnerabilities into coordinated attacks. Extended, autonomous reasoning applied to offensive security.

Result: Anthropic decided Mythos would not be released. Not yet. Maybe never.


The Structural Irony

It is worth pausing to take stock of what is happening.

Anthropic is the company founded in 2021 by ex-OpenAI researchers concerned about existential AI risks. Their guiding principle: "responsible scaling." Their architecture: Constitutional AI, designed to embed values in the model itself. Their positioning: the serious counterweight to their competitors' frantic race.

This model, the most aligned, most audited, most documented, just identified thousands of critical vulnerabilities in global digital infrastructure. And Anthropic built it.

The S3 bucket leak is not a minor footnote. It illustrates a pattern known as capability drift: organizations develop systems whose practical implications outpace their operational procedures. Deploy before securing. Publish drafts on public servers. Announce internally capabilities that should have stayed confidential.

This is not a problem of bad intent. It is a problem of scale.


Project Glasswing: Virtue or Private Club?

Facing the problem, Anthropic opted for Project Glasswing: restricted access to Mythos Preview for around forty organizations: AWS, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, Nvidia, and about thirty others. Stated objective: use the model to patch systems in advance before these capabilities proliferate.

$100 million in usage credits distributed to participants. The idea: let's fix the holes before anyone else can exploit them.

It is an elegant approach, and it deserves serious examination.

First, what it does well: the companies controlling global critical infrastructure will have a few months' head start to remediate vulnerabilities. For banking systems, operating systems, browsers, that is meaningful.

Then, what it does not do: it does not solve the problem. It defers it. Alex Stamos (Corridor, former Facebook) put it plainly: "We have about six months before open-weight models catch up to foundation models in vulnerability detection."

In other words: within six months, open-source models will have the same capabilities. And those, nobody controls.


The Second-Order Effect Nobody Wants to Look At

This is where conventional editorial analysis stops. Here is where it should continue.

Project Glasswing creates a temporary asymmetry between large tech companies and everyone else. The 40 selected organizations patching their systems while less well-connected competitors remain exposed. SMBs, public administrations, critical infrastructure in developing countries: out of scope.

Deeper: the decision not to release Mythos does not change its existence. Anthropic built it. The researchers who trained it now have the techniques. The training data exists. The methodology is at least partially documented in the leaked draft.

The history of dual-use technology is consistent: capabilities developed for defense end up in unintended hands. NSA tools (EternalBlue) fueled WannaCry. Military encryption techniques are in our phones. There is no reason this cycle stops with Mythos.


What This Changes Concretely

For ordinary people, the immediate threat is limited. Mythos is not in the wild. Project Glasswing is running patches. Critical systems are probably better secured than they were before.

But the signal matters.

AI just demonstrated it can do in a few weeks what thousands of security engineers failed to do in 27 years. This is not a metaphor about "AI changing everything." It is an empirical result, documented, dated April 2026.

What this implies:

For cybersecurity: the "security" benchmarks our systems have passed until now are obsolete. An OS considered safe for three decades was not. What does that say about every audit conducted without access to Mythos?

For regulation: the Treasury and the Fed convened Wall Street on an emergency basis. Not tech agencies, financial institutions. The underlying read: this is no longer a tech risk, it is a systemic risk.

For Anthropic: the company faces an existential irony. Its mission is to ensure AI benefits humanity. Its most advanced model is too dangerous to share with it.


Conclusion

The real problem is not Claude Mythos. It is that if Anthropic built it, others have too, or will soon. The decision not to release is reasonable. It resolves nothing.

In the history of dual-use technologies, non-release decisions typically have an impact of months to a few years. After that, capabilities diffuse: through parallel research, competition, leaks.

The next time your company asks whether its systems are secure against AI threats, the correct answer in April 2026 is: probably not.


Sources: Fortune (March 26, April 7, April 10, 2026), Platformer (April 2026), CNN Business (April 3, 2026), TechCrunch (April 7, 2026)

Topics covered:

SecurityAnthropicNews

Frequently asked questions

What is Claude Mythos?
Claude Mythos is an AI model developed by Anthropic, revealed through an accidental leak in March 2026. It is described as 'far ahead of any AI model in cyber capabilities' and identified thousands of zero-day vulnerabilities across major operating systems and browsers.
Why isn't Anthropic releasing Claude Mythos?
Anthropic deemed Claude Mythos too dangerous for public release. The model can autonomously identify and chain critical vulnerabilities, posing a systemic risk if access were uncontrolled.
What is Project Glasswing?
Project Glasswing is Anthropic's restricted access program for Claude Mythos Preview. Around 40 organizations (AWS, Apple, Google, Microsoft, Nvidia, Cisco, JPMorganChase...) receive defensive access, backed by $100 million in usage credits, to patch systems before these capabilities proliferate.
How long before open-source models reach similar capabilities?
According to Alex Stamos (security expert, former Facebook), open-weight models are expected to reach capabilities similar to Claude Mythos in vulnerability detection within approximately six months. Project Glasswing provides a temporary window, not a permanent solution.
Which systems did Claude Mythos compromise?
Claude Mythos identified a 27-year-old bug in OpenBSD, a flaw in FFmpeg that survived 5 million automated tests, and multiple Linux kernel vulnerabilities enabling full machine compromise. These findings span all major operating systems and popular browsers.
How are financial institutions responding?
Treasury Secretary Bessent and Fed Chair Powell convened an emergency meeting with Wall Street executives on April 10, 2026. The underlying read: the Mythos risk is no longer a tech problem, it is a systemic financial risk.
The free AI newsletter