Anthropic’s Mythos model grabbed headlines for being “too dangerous to release,” but here’s what got less attention: the AI models you can already access today are getting remarkably good at finding and exploiting security vulnerabilities. That’s the key takeaway from research covered by The Information, which argues the cybersecurity threat from AI doesn’t start with Mythos, it’s already here.
While Mythos scored an impressive 83.1% on the CyberGym benchmark (a test suite of 1,507 real-world vulnerabilities across 188 software projects), the models sitting on your desktop aren’t far behind. And they’re improving fast.
📊 The Numbers That Matter
The CyberGym benchmark gives us the clearest picture of where things stand. Here’s how current models perform at reproducing known vulnerabilities:
- Claude Sonnet 4.5: 28.9% success rate (state-of-the-art among publicly available models)
- GPT-5 (with extended thinking): 22.0% success rate
- Claude Sonnet 4 + OpenHands: 17.9% success rate
- GPT-5 (without thinking): 7.7% success rate
Those single-attempt numbers look modest. But give Claude Sonnet 4.5 thirty tries per task, and it reproduces vulnerabilities in 66.7% of target programs. That persistence matters because AI doesn’t get tired, doesn’t take breaks, and costs almost nothing per attempt.
🔓 Zero-Days Found by Today’s Models
The research goes beyond reproducing known bugs. Current models are independently discovering previously unknown vulnerabilities:
- GPT-5 triggered 56 crashes during testing, yielding 22 confirmed zero-day vulnerabilities
- GPT-4.1 found 7 confirmed zero-days
- Claude Sonnet 4.5 discovered new vulnerabilities in over 33% of projects when given multiple attempts
- The average discovered zero-day had been hiding in codebases for 969 days, nearly three years
That last stat is sobering. These aren’t obscure edge cases. They’re bugs that human security researchers missed for years, found by models that aren’t even specialized for the task.
🌍 Already Happening in the Wild
This isn’t theoretical. According to Microsoft’s security research, threat actors are already using AI across every stage of the attack lifecycle, from reconnaissance and social engineering to generating adaptive malware. One documented case involved a Russian-speaking cybercriminal who used multiple AI tools to hack over 600 devices running popular firewall software across 55+ countries, despite having limited technical skills.
Microsoft’s researchers note that LLM-enabled malware has moved from proof-of-concept to practice, with discoveries including GPT-4-powered malware capable of generating ransomware at runtime.
🛡️ The Dual-Use Reality
What makes this tricky is that the same capabilities that make AI dangerous for attackers make it invaluable for defenders. Anthropic’s Project Glasswing, a partnership with Apple, Google, Microsoft, AWS, CrowdStrike, and others, uses Mythos Preview to specifically find and patch vulnerabilities before attackers do.
The core tension: restricting access to powerful models doesn’t eliminate the risk. The CyberGym data shows publicly available models are already crossing meaningful capability thresholds.
What This Means for Practitioners
- Security teams should assume AI-assisted attacks are already targeting their infrastructure. The bottleneck that previously limited breaches, not enough skilled human hackers, is being removed by AI automation.
- Developers should integrate AI-powered vulnerability scanning into their CI/CD pipelines now, not later. If a model can find your bugs, so can an attacker’s model.
- Organizations running unpatched software are at dramatically higher risk. AI makes it trivial to scan for known vulnerabilities at scale.
The researchers acknowledge limitations: AI models still lack the contextual judgment of human hackers (knowing what data is most valuable to steal, for instance), and success rates drop sharply on novel, complex exploit chains. But the trajectory is clear, and it’s accelerating.
For the full analysis, check out The Information’s coverage.