this post was submitted on 12 Aug 2025
185 points (94.3% liked)

Programmer Humor

26046 readers
1216 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] partial_accumen@lemmy.world 29 points 2 weeks ago (1 children)

Understanding how LLMs actually work that each word is a token (possibly each letter) with a calculated highest probably of the word that comes next, this output makes me think the training data heavily included social media or pop culture specifically around "teen angst".

I wonder if in context training would be helpful to mask the "edgelord" training data sets.

[โ€“] ilinamorato@lemmy.world 9 points 2 weeks ago

Yeah, I think the training data that's most applicable here is probably troubleshooting sites (i.e. StackOverflow), GitHub comment threads, and maybe even discussion board forums. That's really the only place you get this deep into configuration failures, and there is often a lot of catastrophizing there. Probably more than enough to begin pulling in old LiveJournal emo poetry.