AI Agent Safety: DeepMind & Anthropic Warn of Swarm Risks

Google DeepMind has a new worry, and it’s not a single rogue superintelligence. It’s the swarm. According to MIT Tech Review, researchers at the lab are sounding the alarm about what happens when millions of AI agents start interacting with each other at once, in ways nobody can fully predict.

This is significant because it shifts the safety conversation. For years the fear was one model getting too smart. Now the concern is emergent behavior from crowds of agents, each one improvising, none of them guaranteed to act rationally.

The core argument: you can’t study this in isolation

DeepMind’s Shah and Fox told MIT Tech Review that the only way to understand multi-agent systems is to run realistic simulations. Drop AI agents into sandboxes. Watch what they actually do at scale.

Their reasoning is blunt. You can’t predict the behavior of a crowd by studying one agent, or even a small group. As Fox put it, you can’t assume agents built on large language models will always act rationally. The complexity comes from huge numbers of interactions happening simultaneously.

There’s a deeper idea buried here. Some DeepMind researchers have argued that artificial general intelligence, if it’s possible at all, might not come from one super-smart model. It could emerge from a kind of agent hive mind, where the whole adds up to more than the sum of its parts. That reframes the whole race. The intelligence might live in the network, not the node.

When MIT Tech Review asked Shah about doomer-tier scenarios like economic collapse, he ruled it out for this year, then laughed and added “a while after that.” Read that how you want.

Anthropic and the zero trust playbook

DeepMind isn’t alone. A couple of weeks ago Anthropic published guidelines for deploying AI agents using a cybersecurity approach called zero trust. The starting assumption is grim by design: the system is vulnerable, the agent is an attacker, and a breach will happen.

That framing matters. Two of the biggest labs are independently telling the market that agents need to be treated as untrusted by default.

Refael Angel, cofounder and CTO of Tel Aviv cybersecurity firm Akeyless, explained to MIT Tech Review why old security models break. Every past approach assumed the machine was software written by a human, doing fixed things on fixed paths. “An agent breaks all of those assumptions,” Angel said. “It reasons, it improvises, and it can be hijacked by a single sentence buried in a document it was asked to read.”

That last line is the one to remember. Prompt injection isn’t theoretical. A poisoned document is a live attack surface.

The pushback worth hearing

Angel welcomes new safety funding but adds a sharp caveat. “No single lab should author the safety standards everyone else has to trust,” he said. He also warns that researchers can chase exotic hypotheticals while ignoring boring problems that already exist today.

Fox doesn’t entirely disagree, but notes the timeline keeps surprising everyone. Risks that were hypothetical a few years ago are real now. “The future’s come more quickly than perhaps expected.”

What stands out to me is the tension between those two views. One says focus on the spectacular future risks. The other says fix the dull, present-day holes first. Both are probably right, and the labs that win trust will do both.

What this means for you

If you’re building or deploying agents, treat this as an early warning, not background noise.

Adopt zero trust now. Assume your agent can be hijacked. Scope its permissions tight. Don’t let it read untrusted documents with full system access.
Test in crowds, not solos. A single agent passing your tests tells you little about how a fleet behaves. Simulate interactions before you ship at scale.
Guard the input layer. Prompt injection through documents, emails, and web pages is the attack that’s already here. Filter and sandbox what your agents read.
Watch who’s writing the rules. Safety standards authored by one lab become everyone’s dependency. Push for independent, shared frameworks.

The agent gold rush is moving fast, and the people building it are openly nervous. That’s the signal. When DeepMind and Anthropic both warn about the same thing in the same month, the smart move is to listen before the swarm shows up at your door. Full details are at MIT Tech Review.

Read original article

The core argument: you can’t study this in isolation

Anthropic and the zero trust playbook

The pushback worth hearing

What this means for you

Related: