When Bots Attack: Developers' Creative War on Internet Crawlers

AISUPER COMPUTER

NEOCODE

3/28/20252 min read

black and white robot toy on red wooden table
black and white robot toy on red wooden table

The internet has a new pest problem, and it's not spam emails or pop-up ads—it's AI web crawlers. These digital locusts are swarming open-source websites, overwhelming servers, and driving developers to the brink of frustration. But instead of waving white flags, these tech warriors are fighting back with a mix of technical ingenuity and wry humor.

The Invisible Invasion

Imagine running a website that's suddenly bombarded by thousands of relentless robot visitors. They ignore your "do not enter" signs, crash your servers, and keep coming back for more. This isn't science fiction—it's the daily reality for open-source developers.

Xe Iaso, a developer who's been on the front lines of this digital battle, describes the problem vividly: "These bots will scrape your site until it falls over, and then they'll scrape it some more." They're not just visiting; they're essentially conducting digital home invasions, clicking every link repeatedly, often multiple times in the same second.

Meet the Defenders

Anubis: The Digital Bouncer

Enter Anubis, a clever tool created by Iaso that's become a folk hero in the open-source community. Named after the Egyptian god who judges souls, this digital gatekeeper does something remarkably simple yet brilliant: it forces web requests to prove they're human.

How? By creating a proof-of-work challenge that browsers can easily pass but bots cannot. If a request passes, users are greeted with a cute anime picture—a playful victory dance. If it's a bot, access is denied. The tool's popularity exploded quickly, gathering 2,000 stars and 39 forks on GitHub in just days.

Creative Countermeasures

Other developers are getting even more creative. Some are considering loading forbidden pages with absurd content—imagine an AI bot accidentally reading an article about "the benefits of drinking bleach" or "how measles improve bedroom performance." The goal? Make crawling their sites not just useless, but actively unpleasant.

The Real-World Impact

The problem isn't trivial. Drew DeVault, CEO of SourceHut, reports spending up to 100% of his weekly time fighting these crawlers. Jonathan Corbet from Linux Weekly News has seen his site slowed to a crawl. Kevin Fenzi, the sysadmin for the Fedora Linux project, was forced to take a drastic step: blocking entire countries from accessing his servers.

The Deeper Problem

At the heart of this issue is a fundamental disrespect for existing web protocols. Traditional web crawlers used to honor the "robot.txt" file—a simple text file that tells bots what areas of a site they can and cannot access. AI crawlers frequently ignore these digital stop signs, treating websites like all-you-can-eat buffets.

A Cry for Change

DeVault's frustration boils over into a passionate plea: "Please stop legitimizing LLMs or AI image generators or GitHub Copilot or any of this garbage. I am begging you to stop using them, stop talking about them, stop making new ones, just stop."

The Battle Continues

While completely stopping AI crawlers might be impossible, developers are proving that resistance isn't futile. With tools like Anubis, Nepenthes, and Cloudflare's AI Labyrinth, they're making it increasingly difficult and expensive for bots to indiscriminately harvest web content.

In this digital arms race, creativity, humor, and technical skill are the developers' most powerful weapons. The message is clear: just because you can crawl doesn't mean you should.