Cloudflare Sets a Deadline for AI Crawlers

Cloudflare just handed the AI industry a firm deadline. Starting September 15, 2026, its default settings will block “mixed-use” crawlers from any page that hosts ads, according to TechCrunch AI. The company announced the change on Wednesday, and the target is clear: bots that blend traditional search with AI training and agent work.

TechCrunch AI reports that these crawlers will be shut out by default unless a site owner turns the setting back on. Until now, one bot could quietly do three jobs at once. Cloudflare wants that split out.

What’s Actually Changing

The new default draws a line between crawling for search and crawling for AI. Here’s how it shakes out:

  • Mixed-use crawlers get blocked from ad-hosting pages by default.
  • The rule applies to new Cloudflare customers, new sites from existing customers, and all existing free customers.
  • Site owners who want to keep letting those bots in can adjust the settings themselves.
  • The goal, per CEO Matthew Prince, is to push crawlers to “separate out search from agent use and training.”

This matters because a lot of AI companies rely on general-purpose bots to feed both their search products and their model training. Cloudflare sits in front of a huge slice of the web, so a default change here ripples across the whole ecosystem.

The Google Jab

Cloudflare didn’t name names, but it wasn’t subtle. The company called out the “world’s largest search engine” for having access to roughly “2x more information” than other AI firms, arguing the search giant makes it hard to stay discoverable without also being used for AI. That’s a Google reference, plainly.

Google has pushed back on this framing before. As TechCrunch AI notes, Google offers a bot called Google Extended that lets site owners opt out of AI training and products like Gemini without hurting their Search ranking. The catch: its main Googlebot still crawls for Search features like AI Overviews and AI Mode. So opting out isn’t as clean as it sounds.

What stands out here is the leverage. Cloudflare is trying to force transparency of intent, rewarding bots that clearly state what they’re crawling for and blocking the ones that keep it vague.

From Pay Per Crawl to Pay Per Use

This fits a pattern. Cloudflare has spent the last couple of years shipping tools to give publishers more say over their content, including a marketplace called Pay Per Crawl that lets sites charge AI bots for scraping.

Now that’s evolving into “Pay Per Use.” Instead of charging only when content gets fetched, publishers can charge when their content actually creates value. The distinction is real money. Cloudflare’s own data suggests over 50% of AI crawler traffic is spent re-fetching pages that haven’t changed, which burns publisher bandwidth for nothing.

To get it running, Cloudflare is starting with two partners:

  • Ceramic.ai – publishers get paid when their content shows up in Ceramic’s AI search results.
  • You.com – publishers get paid when You.com accesses their premium content.

Other AI companies can adapt the model to fit how they operate.

Why This Matters Now

Prince tied the move to a milestone that arrived early: bots have surpassed humans as the majority of internet traffic, a shift that wasn’t expected until next year. “Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge,” he said.

For AI companies, this is a signal to clean up their crawling. Vague, do-everything bots are about to hit walls, while bots with declared intent get smoother access and a clear path to paying for what they use. For publishers, it’s more control and a new revenue lever that ties payment to value, not just fetches.

The September 15, 2026 date gives both sides room to adjust. Expect AI providers to start splitting their crawlers and expect more content marketplaces to follow Cloudflare’s lead. The era of free, unlimited scraping is closing, and the pricing conversation is just getting started. Full details are available at the original source.

Scroll to Top