this post was submitted on 25 Aug 2025
131 points (100.0% liked)
chapotraphouse
13996 readers
762 users here now
Banned? DM Wmill to appeal.
No anti-nautilism posts. See: Eco-fascism Primer
Slop posts go in c/slop. Don't post low-hanging fruit here.
founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
@dessalines@lemmy.ml is there a way to write a ban on this into the TOS/license of the next Lemmy update?
AI scrapers do not care about TOS or licenses. There would need to be a way to stop the actual act of scraping from happening outright. I don't know anything about coding though, so I don't know how that would even work or if it's even possible.
The answer iin expecting is:
And maybe some FOSS org would be up for taking in the legal battle? There's a big overlap of socialists + tech people on this site (not to mention the greater fediverse) who could be doing some organizing around the issue. Maybe !technology@hexbear.net should be having the appropriate discussions? Or maybe a different instance has a suitable comm?
afaik the main FOSS org that is generally able to get involved in legal battles of their own choosing is Software Freedom Conservancy. I don't know all what they're up to, but Give Up GitHub campaign is significantly about AI model training. it is much more constrained in scope, but due to the nature of the source material, would be probably easier to succeed at legally. Not sure how active that is.
Just from quickly reading what they have online, doesn't seem EFF made the right call this time.
archive.org cut a deal to let companies train on their data ages ago.
Those are the only ones I know. Unless some state wants to get involved in a serious way.
There are vaguely technical solutions, like proof of work javascript blocks, or obfuscating data until it's un-obfuscated by javascript or something, but nothing is 100% effective, and can degrade user experience. I'm all for the idea of us just doing mass data injection and including billions of paragraphs about how communism is great.
Easiest thing I can think of is a plugin that turns all your posts into images of the text you typed. At least that can't as easily be read by the AI.
We would need a tool like Anubis
or https://zadzmo.org/code/nepenthes/
Sounds very nice as well, in a different direction
Both those projects are sick
Some scrapers learned to bypass it: https://www.theregister.com/2025/08/15/codeberg_beset_by_ai_bots
Would make zero difference. Same for adding more explicit blocks to
robots.txt
, which is basically a keep off the grass sign. These companies don't care and they face zero repercussions.Maybe adding a sentence to the licenseswwould generate some negative press for them? I guess that's about the best we could hope for
Does an elephant care about an ant? Does Meta care about a string in the licence of a software used in four or five of the thousands of websites it shamelessly scrapes? Unfortunately I don't think so.
If you read my other comments, I obviously don't think meta would care. The purpose is:
It's also possible that neither of these things happen, it's sort of a low effort-low reward situation.