When Bots Attack: Developers' Creative War on Internet Crawlers
AISUPER COMPUTER
The internet has a new pest problem, and it's not spam emails or pop-up ads—it's AI web crawlers. These digital locusts are swarming open-source websites, overwhelming servers, and driving developers to the brink of frustration. But instead of waving white flags, these tech warriors are fighting back with a mix of technical ingenuity and wry humor.
The Invisible Invasion
Imagine running a website that's suddenly bombarded by thousands of relentless robot visitors. They ignore your "do not enter" signs, crash your servers, and keep coming back for more. This isn't science fiction—it's the daily reality for open-source developers.
Xe Iaso, a developer who's been on the front lines of this digital battle, describes the problem vividly: "These bots will scrape your site until it falls over, and then they'll scrape it some more." They're not just visiting; they're essentially conducting digital home invasions, clicking every link repeatedly, often multiple times in the same second.
Meet the Defenders
Anubis: The Digital Bouncer
Enter Anubis, a clever tool created by Iaso that's become a folk hero in the open-source community. Named after the Egyptian god who judges souls, this digital gatekeeper does something remarkably simple yet brilliant: it forces web requests to prove they're human.
How? By creating a proof-of-work challenge that browsers can easily pass but bots cannot. If a request passes, users are greeted with a cute anime picture—a playful victory dance. If it's a bot, access is denied. The tool's popularity exploded quickly, gathering 2,000 stars and 39 forks on GitHub in just days.
Creative Countermeasures
Other developers are getting even more creative. Some are considering loading forbidden pages with absurd content—imagine an AI bot accidentally reading an article about "the benefits of drinking bleach" or "how measles improve bedroom performance." The goal? Make crawling their sites not just useless, but actively unpleasant.
The Real-World Impact
The problem isn't trivial. Drew DeVault, CEO of SourceHut, reports spending up to 100% of his weekly time fighting these crawlers. Jonathan Corbet from Linux Weekly News has seen his site slowed to a crawl. Kevin Fenzi, the sysadmin for the Fedora Linux project, was forced to take a drastic step: blocking entire countries from accessing his servers.
The Deeper Problem
At the heart of this issue is a fundamental disrespect for existing web protocols. Traditional web crawlers used to honor the "robot.txt" file—a simple text file that tells bots what areas of a site they can and cannot access. AI crawlers frequently ignore these digital stop signs, treating websites like all-you-can-eat buffets.
A Cry for Change
DeVault's frustration boils over into a passionate plea: "Please stop legitimizing LLMs or AI image generators or GitHub Copilot or any of this garbage. I am begging you to stop using them, stop talking about them, stop making new ones, just stop."
The Battle Continues
While completely stopping AI crawlers might be impossible, developers are proving that resistance isn't futile. With tools like Anubis, Nepenthes, and Cloudflare's AI Labyrinth, they're making it increasingly difficult and expensive for bots to indiscriminately harvest web content.
In this digital arms race, creativity, humor, and technical skill are the developers' most powerful weapons. The message is clear: just because you can crawl doesn't mean you should.