this post was submitted on 06 Mar 2025
125 points (94.3% liked)

Programming

18656 readers
129 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 2 years ago
MODERATORS
 

I can’t believe nobody has done this list yet. I mean, there is one about names, one about time and many others on other topics, but not one about languages yet (except one honorable mention that comes close). So, here’s my attempt to list all the misconceptions and prejudices I’ve come across in the course of my long and illustrious career in software localisation and language technology. Enjoy – and send me your own ones!

you are viewing a single comment's thread
view the rest of the comments
[–] 2xsaiko@discuss.tchncs.de 10 points 3 days ago (3 children)

Segmenting a text into sentences is as easy as splitting on end-of-sentence punctuation.

Is there a language this actually isn't true for? It seems oddly specific like a lot of the others and I don't think I know of one that does this. Except maybe some wack ass conlangs of course.

[–] Giooschi@lemmy.world 21 points 3 days ago* (last edited 3 days ago)

Even in english this isn't true, for example dots can appear inside a sentence for multiple reasons (a decimal number, an abbreviation, a quotation, three dots, etc, etc), which would make you split it into more than one piece.

[–] TehPers@beehaw.org 15 points 3 days ago* (last edited 3 days ago) (2 children)

English. I can go to the store and buy a sandwich for $8.99 all in one sentence, but splitting it on periods gives you two sentences.

[–] 2xsaiko@discuss.tchncs.de 6 points 3 days ago (1 children)

Oh of course, I didn't think about punctuation occurring in the middle of a sentence. Duh, thanks.

[–] bkhl@social.sdfeu.org 2 points 3 days ago (1 children)

@2xsaiko @TehPers there's other examples too. E.g. Thai has no spaces between words but spaces between phrases/sentences. However the spaces between phrases involve style choices similar to comma in English and many other Latin script writing systems. Also, Thai may have spaces around abbreviations special characters.

I'm quite familiar with Thai so that's close at hand but I guess it's the same in a lot of other writing systems based on Brahmic scripts.

[–] 2xsaiko@discuss.tchncs.de 1 points 2 days ago

Interesting, thanks!

[–] Kissaki@programming.dev 1 points 2 days ago

"splitting on end-of-sentence punctuation" would not split on 8.9 though!?

[–] schnurrito@discuss.tchncs.de 4 points 3 days ago

There are languages that don't have the concept of "punctuation" at all.