this post was submitted on 12 Aug 2025

181 points (100.0% liked)

Fuck AI

4080 readers

1324 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

181

Leaked list shows Facebook training their AI on multiple Lemmy instances (sopuli.xyz)

submitted 1 month ago by glowing_hans@sopuli.xyz to c/fuck_ai@lemmy.world

49 comments fedilink hide all child comments

cross-posted from: https://lemmy.ml/post/34374544

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments

[–] gravitywell@sh.itjust.works 6 points 1 month ago (3 children)

Its not that hard to block them, I have basically a single user Lemmy and it was constantly getting hammered by meta and anthropic but then I blocked their user agents. They just get endless redirects now.

[–] pinball_wizard@lemmy.zip 5 points 1 month ago

They just get endless redirects now.

Beautiful. The thought of all those robots.txt ignoring theft bots running in circles made me smile. Thank you.

[–] glowing_hans@sopuli.xyz 2 points 1 month ago (1 children)

sys-admin skills required

[–] gravitywell@sh.itjust.works 10 points 1 month ago

Well yes, one would need sys-admin skills to setup and maintain a Lemmy instance in the first place.

I'm happy to assist other admins if needed. Maybe I'll write up a post about it later.

[–] bvoigtlaender@feddit.org 1 points 1 month ago* (last edited 1 month ago) (1 children)

Do they actually respect that? Did you saw the requests going away/being stuck in redirects? I always expected them to use a generic user agent if that happens. I mean they are arguably already disregarding copyright? Why should they adhere to a standard.

[–] gravitywell@sh.itjust.works 9 points 1 month ago

They mainly self identify, it was super obvious when they started showing up in logs. Even without the user agents to Id, the volume of request make it clear that its clanker behavior.

I've been meaning to setup a tar pit, but for now I just have nginx setup to redirect them and if they still keep trying fail2ban kicks in and blocks them by IP.

It doesn't matter if they respect it or not, iptables doesn't give a fuck.