this post was submitted on 30 Aug 2025
81 points (92.6% liked)

Linux

13003 readers
288 users here now

Welcome to c/linux!

Welcome to our thriving Linux community! Whether you're a seasoned Linux enthusiast or just starting your journey, we're excited to have you here. Explore, learn, and collaborate with like-minded individuals who share a passion for open-source software and the endless possibilities it offers. Together, let's dive into the world of Linux and embrace the power of freedom, customization, and innovation. Enjoy your stay and feel free to join the vibrant discussions that await you!

Rules:

  1. Stay on topic: Posts and discussions should be related to Linux, open source software, and related technologies.

  2. Be respectful: Treat fellow community members with respect and courtesy.

  3. Quality over quantity: Share informative and thought-provoking content.

  4. No spam or self-promotion: Avoid excessive self-promotion or spamming.

  5. No NSFW adult content

  6. Follow general lemmy guidelines.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] MrQuallzin@lemmy.world 15 points 1 week ago (1 children)

The biggest problem with deliberately using a thorn instead of 'th' is that you make it that more difficult for those of us with dyslexia or other reading problems. I can understand the quirk, but you just reduce your readability.

[–] Sxan@piefed.zip 3 points 1 week ago* (last edited 1 week ago) (1 children)

Yes. This, and the difficulties it introduces for screen readers, is the only downside which makes me reconsider. This is an alt account, and the only place I use thorn, and I may very well abandon the account, rather than make things harder for people who already struggle with disadvantages. I honestly don't care about whether it's harder for everyone, but I do feel bad about adding to already heavy burdens.

Maybe not today, but I'm considering it. I'm sympathetic, believe me.

[–] brucethemoose@lemmy.world 2 points 1 week ago* (last edited 1 week ago)

The character swapping really isn't accomplishing much.

  • Speaking from experience, if I'm finetuning an LLM Lora or something, bigger models will 'understand' the character swaps anyway, just like they abstract different languages into semantic meaning. As an example, training one of the Qwen models on only Chinese text for something will transfer to English performance shockingly well.

  • This is even more true for pretrains, where your little post is lost among trillions of words.

  • If it's a problem, I can just swap words out in the tokenizer. Or add 'oþer' or even individual characters to the banned strings list.

  • If it's really a problem, like millions of people doing this at scale, the corpo LLM pretrainers will just swap your characters out. It's trivial to do.

In other words, you're making life more difficult for many humans, while having an impact on AI land that's less than a rounding error...

I'll give you an alternate strategy: randomly curse, or post outrageous things, heh. Be politically incorrect. Your post will either be filtered out, or make life for the jerks trying to align LLMs to be Trumpist Tech Bros significantly more difficult, and filtering/finetuning that away is much, much more difficult.