What if you run a few online services for you and your friends, like a small git instance and a grocery list service, but you get absolutely hammered by “AI” scrapers?
I cannot impress upon you, reader, that this is not only an attack that is coordinated, it is an attack that is distributed.
I run a small set of services, basically only for me and my friends. I am not a hyperscaler, I am not a tech company, I am not even a small platform. I have a git forge where I put the shit I make, and a couple other services where me and my friends backup our files or write our grocery lists. I am not fucking Meta and I cannot scale the fuck up just because OpenAI or Anthropic or Meta or whoever is training a model that weeks wants to suck all the content out of my VPS ONCE MORE until it’s dry.
↫ lux at VulpineCitrus
So how much traffic did the author of this piece, lux, get from “AI” scraping bots? Within a time period of 24 hours, they were hammered by 2040670 unique IP addresses, 98% of which were IPv4 addresses, which means that 1 out of every 2000 publicly available IPv4 addresses were involved in the scraping. Together, they performed over 5 million requests. And just to reiterate: they were scraping a few very small, friends-only services run by some random person. This is absolutely insane.
If, at this point in time, with everything that we know about just how deeply unethical every single aspect of “AI” is, you’re still using and promoting it, what is wrong with you? If you’re so addicted to your “AI” girlfriend’s unending stream of useless, forgettable sycophantic slop, despite being aware of the damage you’re doing to those around you, there’s something seriously wrong with you, and you desperately need professional help. You don’t need any of this. The world doesn’t need any of this. Nobody likes the slop “AI” regurgitates, and nobody likes you for enabling it.
Get help.

I see the problem. I recognize that this is a big problem. I think the conclusion of over simplified.
i) there are other AI use cases, more productive, than what is written above.
ii) I agree on that a large fraction of AI training is based on unethical use of data.
iii) this is killing selfhosting deadly. A couple of years ago (big tech’s) blacklisting was already a big problem for selfhosting. I think AI will definitely kill it.
At this point in time, if it it’s for one person and their friends , they should setup Wireguard (tailscale?) and let friends use it with such vpn , with local addressess only ! I have such setup and it works ok , wireguard may be on all the time .. Dystopian future is now
Anything you do put online will probably also get scraped and cloned too[1]. A while ago, I joined the dn42 project / community[2]. While the technical fun and social aspect is main reason I joined up, I can also see that with the way the current public Internet is going, pretty soon a separate “geeknet” is going to be a more appealing place to hang out.
[1]=https://www.markround.com/blog/2026/04/19/sloppy-copies/
[2]=https://www.markround.com/dn42/
I see business for services like Akamai or Cloudflare to put in their smarts to work to effectively cut AI bots out. It’s doable.
On the other hand given that Google pretty much links presence in their search bot with AI scrapping (including AI overviews) small players are screwed. Only EU could hit them with antitrust to stop the practice.