Almost every website and services are getting scraped at alarming rate, are Lemmy servers facing this issue?
Please share mitigations you’ve seen applied to this.
It’s very easy for any activitypub content to be scraped, all servers practically serve the content on a silver platter to any federated server.
One link on your website leads to a neverending labyrinth of nonesense to slowly poison a LLM.
slrpnk.net has an AI intercept called Anubis, fwiw
We made a post about our actions here
I use this nginx extension.
They don’t really need to scrape. They just have to set up their own federated instance and the ActivityPub protocol will willingly hand it all to them in a nicely parsable format.
I’m sure the AI devs so lazy they cannot train their AI on anything other than scraped HTML can set up a Lemmy instance and point their crawlers at that.

