Claude Mythos Can Hack Systems: Security Evaluation

The UK’s AI Security Institute just dropped a sobering evaluation of Anthropic’s Claude Mythos Preview, and the results mark a clear inflection point in AI capabilities. As reported by Marcus on AI, the institute found that Mythos, and potentially future models, “could be directed to autonomously compromise small, weakly defended, and vulnerable systems if given network access.”

That’s a sentence worth reading twice.

How fast things moved

The AI Security Institute put this in sharp context: in 2023, the best models could barely complete beginner-level cybersecurity tasks. Now, just three years later, a frontier model can autonomously find and exploit vulnerabilities in real systems without human hand-holding.

This isn’t theoretical. The evaluation specifically tested Mythos Preview’s ability to perform offensive cyber operations, scanning, identifying weaknesses, and exploiting them with minimal human direction. The qualifier “small, weakly defended, and vulnerable” matters, but it’s cold comfort. Most small businesses, personal servers, and IoT devices fit that description perfectly.

Why this matters right now

The capability gap is closing fast. Three years from “can barely do beginner tasks” to “can autonomously compromise systems” is an alarming rate of progress. Extrapolate that curve forward and the implications for critical infrastructure get serious.
“If given network access” is the key phrase. The finding doesn’t mean Mythos is hacking things today. It means the technical capability exists if someone removes the guardrails and points it at a target. That’s a policy question, not a technical one.
This validates why evals matter. The AI Security Institute exists precisely to catch these capability jumps before they become real-world problems. This evaluation is the system working as intended.

What comes next

Anthropic has been vocal about responsible scaling and has its own internal evaluation framework for dangerous capabilities. The fact that this finding came from an independent government body, not Anthropic’s own safety team, adds credibility and raises the bar for what “safe deployment” means going forward.

For security practitioners, the takeaway is practical: the threat model now includes AI-directed autonomous attacks against soft targets. Hardening basics: patching, network segmentation, access controls, just became even more urgent. The systems most at risk are exactly the ones that have been getting away with poor security hygiene for years.

For the AI industry, this is another data point in the growing case that frontier model releases need structured evaluation before deployment, not after. The AI Security Institute’s work here sets a template other governments will likely follow.

The full evaluation details are available through the AI Security Institute’s channels, with additional context covered by Marcus on AI.

Read original article

How fast things moved

Why this matters right now

What comes next

Related: