Twitter/X

UK AI Security Institute (AISI) reported on 2026-05-13 that Claude Mythos Preview…

Brief

Anthropic’s Claude Mythos Preview (via Project Glasswing) passed two independent evaluations (UK AISI and XBOW) that reported breakthrough autonomous offensive-security capabilities: AISI says Mythos solved both end-to-end ranges including the never-before-cleared Cooling Tower and met a 2.5M-token cap; XBOW flagged unprecedented token-level precision. Anthropic is distributing Mythos to defenders while building safeguards and disclosure workflows.

Why it matters

UK AI Security Institute (AISI) reported on 2026-05-13 that Claude Mythos Preview is the first model to solve both of their end-to-end cyber ranges, including the previously unsolved “Cooling Tower,” and it cleared every task estimated over 8 hours under AISI’s 2.5M-token cap.

Key details

  • XBOW’s offensive-security benchmark found Mythos Preview shows “token-for-token, unprecedented precision” and is the only model to succeed at subtle V8 sandbox tasks.
  • Anthropic’s Project Glasswing (led by Logan Graham) is sharing Mythos with defenders; partners say weeks of testing uncovered many thousands of estimated high+critical vulnerabilities (sometimes double their normal annual finds). Anthropic says they’re implementing safeguards and disclosure/patching processes and that compute was not a rollout limiter.
Source evidence

The UK AISI found Mythos Preview is the first model to solve both their cyber ranges end-to-end. No model had ever solved the AISI’s “Cooling Tower” cyber range before.

We're getting it to defenders as fast as we responsibly can. More to come on our Glasswing work soon.

Logan Graham (@logangraham)

A lot of people have been wondering about Mythos, Glasswing, and the vulns we / our partners are fixing. Today, I’m excited for us to start sharing more. (For context, I lead Glasswing @AnthropicAI.)

Two independent evaluations this week—from XBOW and the UK AISI—confirm what we've been seeing internally: Claude Mythos Preview is a step change in autonomous cybersecurity capabilities. We need to start preparing fast for a world of models with this level of capabilities.

The UK AI Security Institute tested the model we shipped at the launch of Project Glasswing and found Mythos Preview is the first model to solve both of their end-to-end cyber ranges, including one (Cooling Tower) which no model had ever cleared. But attackers (and defenders) have sophistication & cost constraints – Mythos is also the only model that clears every one of their tasks estimated over 8 hours under their deliberately low 2.5M-token cap.

XBOW tested it on their offensive security benchmarks, finding "token-for-token, unprecedented precision." It's the only model to succeed at subtle V8 sandbox work.

Other Glasswing partners shared similar stories. In a few weeks of testing, Mythos Preview has helped them find many thousands of (estimated) high + critical severity vulnerabilities, sometimes double what they'd normally find in a year.

I don't share this to boost Mythos. In fact, this is not about Mythos. It’s about preparing for the coming world of models being better, faster, cheaper, and more creative than some of the best human experts at dual use capabilities. Clearly, we need them supporting defenders as widely as can be done safely – and especially the least resourced ones.

Within a year, Mythos will probably look quite dumb (relative to other new models). And others may release openly available or unguardrailed models of Mythos-level capabilities.

We started Project Glasswing because capabilities like Mythos Preview's won't stay rare, or stay in careful hands. We are bringing it to defenders as fast as we responsibly can, while working to figure out, for example, the right safeguards and patching & disclosure processes.

Also, to be clear, compute has never been a limiter in our rollout.

Expect a fuller update on our Glasswing work in the coming days.

XBOW report: xbow.com/blog/mythos-offensi…

UK AISI report: aisi.gov.uk/blog/how-fast-is…

— https://nitter.net/logangraham/status/2054613618168082935#m