this post was submitted on 11 Jan 2025
344 points (98.0% liked)
Technology
60432 readers
3826 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The notorious piracy database in question is Library Genesis.
Cached article:
https://web.archive.org/web/20250110075821/https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/
Earlier reports suggested they trained it on books from Bibliotik.
What changed?
Probably just both honestly.
In for a penny and for a pound.
The llama-1 paper acknowledged the use of the books dataset, libgen isn't mentioned in any of the papers so this is new info.