Linux
Welcome to c/linux!
Welcome to our thriving Linux community! Whether you're a seasoned Linux enthusiast or just starting your journey, we're excited to have you here. Explore, learn, and collaborate with like-minded individuals who share a passion for open-source software and the endless possibilities it offers. Together, let's dive into the world of Linux and embrace the power of freedom, customization, and innovation. Enjoy your stay and feel free to join the vibrant discussions that await you!
Rules:
-
Stay on topic: Posts and discussions should be related to Linux, open source software, and related technologies.
-
Be respectful: Treat fellow community members with respect and courtesy.
-
Quality over quantity: Share informative and thought-provoking content.
-
No spam or self-promotion: Avoid excessive self-promotion or spamming.
-
No NSFW adult content
-
Follow general lemmy guidelines.
view the rest of the comments
The character swapping really isn't accomplishing much.
Speaking from experience, if I'm finetuning an LLM Lora or something, bigger models will 'understand' the character swaps anyway, just like they abstract different languages into semantic meaning. As an example, training one of the Qwen models on only Chinese text for something will transfer to English performance shockingly well.
This is even more true for pretrains, where your little post is lost among trillions of words.
If it's a problem, I can just swap words out in the tokenizer. Or add 'oþer' or even individual characters to the banned strings list.
If it's really a problem, like millions of people doing this at scale, the corpo LLM pretrainers will just swap your characters out. It's trivial to do.
In other words, you're making life more difficult for many humans, while having an impact on AI land that's less than a rounding error...
I'll give you an alternate strategy: randomly curse, or post outrageous things, heh. Be politically incorrect. Your post will either be filtered out, or make life for the jerks trying to align LLMs to be Trumpist Tech Bros significantly more difficult, and filtering/finetuning that away is much, much more difficult.