What if you run a few online services for you and your friends, like a small git instance and a grocery list service, but you get absolutely hammered by “AI” scrapers?
I cannot impress upon you, reader, that this is not only an attack that is coordinated, it is an attack that is distributed.
I run a small set of services, basically only for me and my friends. I am not a hyperscaler, I am not a tech company, I am not even a small platform. I have a git forge where I put the shit I make, and a couple other services where me and my friends backup our files or write our grocery lists. I am not fucking Meta and I cannot scale the fuck up just because OpenAI or Anthropic or Meta or whoever is training a model that weeks wants to suck all the content out of my VPS ONCE MORE until it’s dry.
↫ lux at VulpineCitrus
So how much traffic did the author of this piece, lux, get from “AI” scraping bots? Within a time period of 24 hours, they were hammered by 2040670 unique IP addresses, 98% of which were IPv4 addresses, which means that 1 out of every 2000 publicly available IPv4 addresses were involved in the scraping. Together, they performed over 5 million requests. And just to reiterate: they were scraping a few very small, friends-only services run by some random person. This is absolutely insane.
If, at this point in time, with everything that we know about just how deeply unethical every single aspect of “AI” is, you’re still using and promoting it, what is wrong with you? If you’re so addicted to your “AI” girlfriend’s unending stream of useless, forgettable sycophantic slop, despite being aware of the damage you’re doing to those around you, there’s something seriously wrong with you, and you desperately need professional help. You don’t need any of this. The world doesn’t need any of this. Nobody likes the slop “AI” regurgitates, and nobody likes you for enabling it.
Get help.

I see the problem. I recognize that this is a big problem. I think the conclusion of over simplified.
i) there are other AI use cases, more productive, than what is written above.
ii) I agree on that a large fraction of AI training is based on unethical use of data.
iii) this is killing selfhosting deadly. A couple of years ago (big tech’s) blacklisting was already a big problem for selfhosting. I think AI will definitely kill it.
At this point in time, if it it’s for one person and their friends , they should setup Wireguard (tailscale?) and let friends use it with such vpn , with local addressess only ! I have such setup and it works ok , wireguard may be on all the time .. Dystopian future is now
Anything you do put online will probably also get scraped and cloned too[1]. A while ago, I joined the dn42 project / community[2]. While the technical fun and social aspect is main reason I joined up, I can also see that with the way the current public Internet is going, pretty soon a separate “geeknet” is going to be a more appealing place to hang out.
[1]=https://www.markround.com/blog/2026/04/19/sloppy-copies/
[2]=https://www.markround.com/dn42/
I see business for services like Akamai or Cloudflare to put in their smarts to work to effectively cut AI bots out. It’s doable.
On the other hand given that Google pretty much links presence in their search bot with AI scrapping (including AI overviews) small players are screwed. Only EU could hit them with antitrust to stop the practice.
dsmogor,
As an end user on the internet, I find cloudflare’s bot detection is kind of annoying even though it was meant to improve things over captchas.. I know it’s just clicking a box, but I still find it frustrating when I open lots of windows.
I feel it’s kind of silly to think that clicking a box would stop a sophisticated bot. That isn’t going to be a long term solution. Cloudflare has to rely on traffic & IP heuristics, but even that is problematic. If you’ve used WIFI at an airport or hotel you may have noticed that you were being punished with rather aggressive bot detection. I’m afraid it’s a cat and mouse game that can’t be won.
Yeah, that’s another problem. Google’s bots are going to be given the green light because that’s basically required these days to have a viable web presence. 1) they can technically use the data they scrape however they please, including AI and 2) their monopoly becomes even more solidified as smaller competitors are blocked.
Alfman,
I believe what dsmogor meant with the references to Akamai and Cloudflare is that you can put your website behind them to stop de AI bots to hammer it.
And that’s precisely one of the services offered at least by Cloudflare (we use CF at $work and have the anti-bot enabled).
As an end user, their captcha (or Turnstile, as they call it) does seem pretty silly and easily bypassable
richarson,
I’m not familiar with all their options specifically, perhaps you can fill in my knowledge gaps 🙂 In any case a bot blocking service like cloudflare needs a game plan to block bots that don’t identify themselves as such. Bot identification needs to happen using one or more heuristics:
1) IP/traffic heuristics
Subject to both false positives and false negatives especially in situations where the concurrent traffic from hundreds or thousands of users is normal (carrier grade NAT, airports, hotels, etc).
2) User agent heuristics
New user agent heuristic algorithms can be effective at first, but over time adversaries can adapt to them making the heuristics less effective. As bots get harder to identify, heuristics need to become stricter to limit false negatives from getting through, however it’s a tradeoff that implicitly increases false positives, and that harms users.
If I were cloudflare, I might try to detect the IPs of bots using honeypots to make them leak their identity. But on the other side of the equation if I were a bot maker, the way to mitigate these honeypots is to only crawl resources that are human accessible. That’s the reason this is a cat and mouse game; it will keep getting harder to identify bots.
Alfman,
From what I understand: it is not the click on the box, but how the mouse arrow gets form its location to the box. And it seems to be rather effective, aiming for the cheap 99%.
Have been there, have done that: if you are frequent traveler like me, than it is worth to get one of the global e-sim solutions. Money well spent.
Andreas Reichel,
Yes, I was simplifying but you’re right they do mouse tracking. Re-captcha monitors mouse activity too.
It won’t remain effective for long as bot authors upgrade their tooling. Faking human mouse movements is trivial. Part of cloudflare’s arsenal is exploiting a bot’s simplicity. It’s hard for a simple bot to emulate a full browser environment: DOM, javascript, canvases and all. However it is not hard to see where this cat and mouse game is headed: bots will switch to using real browser engines to pass bot detection. The implication is that cloudflare will end up having to make the heuristics harder to pass. We’ve already seen this happen with recaptcha and it wasn’t pretty. We can predict that Cloudflare’s fate will be similar.
I’m cheap and I even find domestic-only service cost too much, haha.
Incidentally, we were vacationing last year and my wife was trying to buy tickets on groupon and whatever bot detection they used it stubbornly refused to to let anyone through. After three different people tried and failed to purchase the groupon tickets using completely different devices and logins, I pulled out my laptop and opened a VPN to initiate traffic from home and groupon allowed our private residential IP through with no issue. I’m curious if a commercial VPN provider would have worked since public endpoints are likely shared by many thousands of VPN subscribers and groupon might just flag them all as suspicious too.
They must be loosing customers this way…innocent casualties of the war on bots.
Welcome to Nigeria my friend 😀
Once you have a Nigerian IP you are literally blocked and locked out from everywhere with really funny consequences: Before traveling there you need to make a Visa, which you need to pay with a Credit Card. However, every serious foreign bank will block any Debit initiated from Nigeria — making it really hard to pay the visa fee. But I digress, sorry …
Yep, this is the new ‘normal’ these days, sadly.
Wondering. Do you feel the same about starving children? one dies every 7 seconds. 40000 people die in cars every year in the US. Over 20000 men commit suicide with handguns every year in the US. And your going after AI. What’s your problem? Cherry picking, and thinking you’re some kind of judging god. How about you get over yourself? Who is dying? Hm? AI is bad, but not thousands of children starving to death every day bad. you’ve chosen AI as you’re badness. It’s not the only badness in the world, So what’s your problem? Well, you’ve appointed yourself to try to murder people’s psyche about AI. You are myopic. Welcome to humanity, hypocrite.
Hey look, a clown has stepped out of the clown car.
Explain.
https://en.wikipedia.org/wiki/Whataboutism
I’m talking about scale of damage. Comparing and contrasting is a normal process. Agree?
I don’t see, how responding to a rant with just another rant helps anyone.
And your examples or comparisons are extremely poorly picked. One can rightfully have concerns on the concentration of power (accelerated by corporate owned LLMs and data) and work on solutions for peace and reducing poverty or road safety at the same time. It is not mutual exclusive and I am sure you can see that.
Two wrongs never make one right, simple boolean algebra.
It’s possible to be passionate about more than one problem at a time.
You should try Claude. It’s awesome. I’m sure it would even understand your frustration.
In any case, technology marches on.
Another one in the countless set of posts against AI. Again, and again and again and again…
Topic is incredibly complex, we all know that.
But wiring up a blame against AIs that do DDoS is nonsense, here. The app was weak, the attack surface was large, AI or not. Better focusing on _your_ app, instead of blaming others. And use tailscape, zero trust and all that nice stuff that was made to cut that sh*t off your apps. Sic et simpliciter.
Thom,
I would like to recommend reading LWN:
https://lwn.net/Articles/1067234/
https://lwn.net/Articles/1068401/
Where I come from, we enjoy the argument — but we stop immediately, when a man goes to the ground.
So I would like to make you an honest offer: why not join me 1 or 2 hours on a Zoom call to experience how the LLM models changed my business for good. The good, the bad, the weird. You can even write an article later about how stupid I am.
It is always good to see, experience and understand both sides of the coin.
Best and cheers, enjoy a long weekend and take a good break.
Thom is not interested in learning how LLMs are actually used. He’s got an ideological position and he’s sticking with it. The less time you spend arguing with folks like this, the better.
As always with AI Thom, your analysis is shallow and childish, not worthy of a real response. Grow up. You are old now.