this post was submitted on 04 Jul 2024
204 points (97.7% liked)

Technology

59613 readers
2953 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Google researchers have come out with a new paper that warns that generative AI is ruining vast swaths of the internet with fake content — which is painfully ironic because Google has been hard at work pushing the same technology to its enormous user base.

you are viewing a single comment's thread
view the rest of the comments
[–] andrew_bidlaw@sh.itjust.works 26 points 4 months ago (2 children)

LLM is the insanely productive content creator. We can't say how much of the web is generated by it at any moment (and that's ignoring older copypaste articles), but the organic material one wants to prioritise in machine learning gets significantly reduced. This tech, if not isolated from it's learning material, is predictably falling into a feedback loop, and at each cycle it is going to get worse.

Surprisingly, pre LLM-boom datasets can probably become more valuable than contemporary ones.

[–] Naz@sh.itjust.works 4 points 4 months ago

I remember reading that from 2021-2023, LLMs generated more text than all humans had published combined - so arguably, actually human generated text is going to be a rarity

[–] jet@hackertalks.com 3 points 4 months ago

Garbage in, garbage out