Anthropic's Mythos Model Surpasses Offensive Hacking Tests

Anthropic’s newest model, internally referred to as Mythos, has grown notably more capable at offensive cybersecurity tasks, according to The Information. UK government researchers who tested the system found the latest iteration outperforms its predecessors on hacking-related benchmarks, raising fresh questions about how frontier labs balance capability gains against real-world misuse risk.

The Information reports that the evaluation was carried out by UK-affiliated AI safety researchers, the same circle that has been running pre-deployment red-team exercises on frontier models for the better part of two years. Their finding is straightforward: each new generation of Anthropic’s flagship model handles a wider slice of the offensive security toolkit, from reconnaissance to exploit chaining, with less hand-holding.

What the researchers actually did

Frontier model evaluations of this kind typically involve a battery of cyber tasks pulled from capture-the-flag competitions, vulnerability discovery exercises, and simulated intrusion scenarios. Evaluators measure whether the model can:

Identify vulnerabilities in sample code or binaries
Write working exploit code with minimal prompting
Chain multiple steps into an end-to-end attack path
Operate autonomously inside a sandboxed target environment

The Information’s reporting doesn’t publish the full benchmark numbers, but the directional read is clear. Mythos closes gaps that earlier Claude versions struggled with, particularly multi-step tasks that previously required a human operator to stitch the pieces together.

Why this matters for practitioners

This is the part security teams should pay attention to. The same capabilities that worry red-teamers are the ones defenders can put to work. A model that can reason through an exploit chain end-to-end is also a model that can audit code, triage alerts, and reproduce vulnerabilities at a speed no human team can match.

What stands out here is the asymmetry. Offensive uplift compounds faster than defensive tooling can absorb it. A junior attacker armed with a frontier model gets a bigger boost than a senior defender already running modern SOC tooling. That gap is the policy problem regulators have been circling for two years, and it’s now showing up in concrete benchmark deltas rather than hypothetical scenarios.

Practical implications worth flagging:

Pen testing budgets shift. Internal red teams can run more scenarios per quarter with model-assisted tooling, but external attackers get the same lift.
Patch windows shrink. Time-from-disclosure-to-weaponized-exploit drops when a model can write the exploit on demand.
Code review changes. Static analysis tools that catch a fixed set of bug classes look thin next to a model that reasons about intent.
Access controls matter more. Anthropic gates these capabilities behind usage policies and monitoring. Self-hosted open weights don’t have that brake.

The limitations researchers acknowledged

Benchmark wins aren’t the same as real-world capability. The UK evaluators reportedly note that controlled tasks miss the messy parts of actual intrusions: noisy environments, defenders who push back, infrastructure that doesn’t match the lab. Models also still hallucinate exploit code that compiles but doesn’t work, and many of the harder gains came on tasks where the model was scaffolded with tools and feedback loops.

Anthropic’s own Responsible Scaling Policy commits the company to additional safeguards once a model crosses certain capability thresholds, including stricter access controls and pre-deployment evaluations. Whether Mythos sits above or below the relevant tripwire is the question regulators and customers will be asking next.

What comes next

Expect three things to move quickly. First, more public detail from UK AISI and US AISI on what exactly the benchmarks show. Second, pressure on every frontier lab to publish comparable numbers, not just Anthropic. Third, a wave of defensive tooling vendors retrofitting their pitches around “AI-vs-AI” security postures, some real, most marketing.

For security leaders, the move is to start running these same evaluations internally against the models your teams already use. If a frontier model can do this in a UK lab, your attackers can do it too.

Full details at the original source.

Read original article

What the researchers actually did

Why this matters for practitioners

The limitations researchers acknowledged

What comes next

Related: