Hacker News new | past | comments | ask | show | jobs | submit login

A fantastic mediaplayer, quite minimalistic and performant; it does what it's supposed to do!

Also has a fantastic commit where the author rants about locales: https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02... worth a read for some chuckles.




As strange as it is to say, I think avoiding problems like this might be one of the biggest productivity boosts from new languages like Go, Rust, Swift, etc. New ecosystems get a chance to “do over” the standard library and flush all the horrible legacy choices made before we knew better (locales, UTF16, etc).

The standard library in Zig, Go, Rust, and many others is miles ahead of the C standard library or posix api. That is reason enough to use them.


I'm skeptical that rust magically deals with, for example, character sets in 30 year old subtitle files, in a way that makes C seem inadequate.

Legacy compatibility has value.


> I'm skeptical that rust magically deals with, for example, character sets in 30 year old subtitle files, in a way that makes C seem inadequate.

It's not just that C is "inadequate" - C and its standard library provide no assistance in that task. As the mpv author explains in profane detail in the linked commit message, POSIX locales are an active hindrance, not a useful form of "legacy compatibility".


Not "magically", but more reasonably and without forcing your entire program into a different state, breaking any ability for libraries to work with a huge range of functionality consistently. C locale handling is basically impossible to work with robustly, even before you get into how it can't be effectively used at all in a thread safe way.


The correct place to handle character sets is when you're reading the file, not to sprinkle it all throughout your program.


Right. And the rust standard library provides (in my mind) the right API for this. Strings are always internally utf8. But they have constructor methods to create strings from UTF16 bytes, or utf32 or whatever.

Rust isn’t unique. Swift, Go and Python3 all expose more or less the same api. C’s standard library, with the benefit of hindsight, is uniquely terrible here.


Locales are so much more than character sets. E.g. an Arabic locale changes the direction of writing, it also changes the characters used for numbers, and completely changes the way numbers and dates are formatted. This is where the C locale functions are problematic.

Character encoding is the easy and safe part.


Locales are much more than character sets, but the question was about character sets.

Also for most of those things, you want to be explicit about when to use the locale and when to not.


> Also for most of those things, you want to be explicit about when to use the locale and when to not.

Right. And that's where the POSIX C API falls down. The locale isn't named explicitly. Its not a function parameter. Its specified via a global variable that gets shared between all your threads.

You might think you can use scanf to parse a string in a JSON file. It might appear to work fine on your local computer. But scanf behaves differently depending on the system locale. You can wrap scanf with a helper function which sets the locale to something sensible, calls scanf, and restores the locale. But because the locale is shared with other threads, which might be depending on the locale in other ways. So this can introduce race conditions.

The whole thing is horribly designed - and it leads to buggy, unreliable code that is hard to reason about. Even in the best case, introducing thread syncronization into a function like sscanf will lead to a dramatic decrease in performance.

Its horrible. Just horrible.


You can create/use a different string processing library without jumping to a completely different language.


A long, informative read, with some profanities. Highly recommended!

25 years ago, Spolsky wrote an article called “everything you wanted to know about Unicode and character sets”. Those of you who only lived in the post Unicode/UTF world might find that one informative as wel.


> Imagine they had done this for certain other things. Like errno, with all the brokenness of the locale API.

They did. See for example time functions like localtime (and localtime_r) and tzset. It is admittedly locale adjacent, since it depends on the locale. But the time zone is also global state, so it is impossible to get the time in a different timezone with standard apis in multi-threaded portable (for posix) c code.


> Both C locales and wchar_t are shitfucked retarded legacy braindeath. If the C/POSIX standard committee had actually competent members, these would have been deprecated or removed long ago. (I mean, they managed to remove gets().) To justify this emotional outbreak potentially insulting to unknown persons, I will write a lot of text. Those not comfortable with toxic language should pretend this is a religious text.

What a legend.


Worth noting that the author of that commit has not been associated with mpv development in years.


Which is a shame because he's highly techinally competent.


in other words: they've kicked him out of his own project


Wow that comment is so educating. I guess I'll pay more attention now to standard functions I use in C code.

As for weirdness of C standard, I guess it is because they wanted to make it compatible with obscure proprietary platforms which might not even exist anymore.


> All in all, I believe this proves that software developers as a whole and as a culture produce worse results than drug addicted butt fucked monkeys randomly hacking on typewriters while inhaling the fumes of a radioactive dumpster fire fueled by chinese platsic toys for children and Elton John/Justin Bieber crossover CDs for all eternity.

This was a great read


Oh man! This was GOLD. Thanks.


Loved the last paragraph of the long, justified rant. Hilarious:

“All in all, I believe this proves that software developers as a whole and as a culture produce worse results than drug addicted butt fucked monkeys randomly hacking on typewriters while inhaling the fumes of a radioactive dumpster fire fueled by chinese platsic toys for children and Elton John/Justin Bieber crossover CDs for all eternity.”


I actually thought that last paragraph really undermined his case, because rather than substantiating like he did before, here he goes all out and just insults whoever he can think of; people who take it in the ass, greybeards, the Chinese, listeners of bland music...

I get him though. It's one of those writings from a foul mood. There was probably more going on in his life than some trouble dealing with locales.


To be fair, he has other issues than dealing with C locales. The author of that commit used to be the main developer behind mpv, until he decided to delete all support for GNOME in a single commit.


That gnome even needs special support says it all. Does the commit have a similarly funny commit message?


> ...here he goes all out and just insults whoever he can think of...

No, he observes that software devs as a group, and as a culture tend to produce worse results than incredibly-distracted and certainly-fatally-intoxicated simians banging on typewriters.

It's a bit of hyperbole, but the overall state of software is absolutely dire.

> ...the Chinese, listeners of bland music...

In some-to-much of the world, it's pretty well-known that a lot of cheap crap (much of which has historically been made in China) is very shoddily made and fairly quickly finds its way to the landfill. One shouldn't confuse criticism of shoddily-made products for criticism of the citizens of the country of origin of said products.

I'd also expect the referenced (certainly entirely-hypothetical) CD to be something that ends up getting thrown into the dumpster in huge numbers because store inventory managers expect it to be WAY more popular than it actually ends up being. Also, see above about not getting confused about what the target of the insult is. ;)

> There was probably more going on in his life than some trouble dealing with locales.

shrug Not everyone chooses to write in sterile $DAYJOB-approved language when explaining in detail the root of their frustration with the absolutely bullshit garbage pile they have to build upon for their non-corporate side project.


> much of which has historically been made in China

The "Chinese crap" argument fails to realize that yes, all cheap crap is made in China, because everything is made in China.

Nobody looks at an iPhone and says "ugh, Chinese crap".


> Nobody looks at an iPhone and says "ugh, Chinese crap".

Well, the really important parts are Taiwanese crap. ;)

But (more seriously), the thing to remember is that the "Chinese crap" stereotype dates back to the days when China didn't have a notable electronics assembly industry... so nearly all the crap hitting US shores was cheap crap. Japan was the big Asian tech producer back then, and we still did a substantial bit of consumer (and industrial) electronics production in-country.


> No, he observes that software devs as a group, and as a culture tend to produce worse results than (...)

Yeah, to that I say "meh". Maybe. In my view, on a different day he would have hacked in a workaround, explained quickly that it is because of the illogical locale system, cited a few sources and moved on with his life. Sure, man-made stuff is a mess. Nothing's ever perfect. But stuff's particularly not perfect when you're in an absolutely foul mood.

It's unconstructive to entertain the thought that software in general is awful. A waste of energy. Reading his rant, I just think "improve it and move on"!

> One shouldn't confuse criticism of shoddily-made products for criticism of the citizens of the country of origin of said products.

Yeah, fair enough. There's different ways to interpret it. Maybe a proud modern Chinese person would be mildly offended by it. No biggie, the point was: in the last paragraph, he's firing a machine gun. A full release of rage.

> Not everyone chooses to write in sterile $DAYJOB-approved language

Of course. It feels great to talk bad when you're in a shit mood. I don't know about you though, but the next day I usually wish I'd just kept my cool. :-)


> In my view, on a different day he would have hacked in a workaround, explained quickly that it is because of the illogical locale system, cited a few sources and moved on with his life.

If you were this guy, sure. I advise you to carefully re-read the ~2,200 word essay contained in that commit message bearing foremost in mind that there exist people who intentionally write messages that make their frustration plain and obvious.

> No biggie, the point was: in the last paragraph, he's firing a machine gun.

No, that's a wrap-up, and it fits the tone of the rest of the essay.

> It feels great to talk bad when you're in a shit mood. I don't know about you though, but the next day I usually wish I'd just kept my cool.

1) I doubt that you're the sort of person to write an angry, in-depth ~2,200 word essay and then regret it the next day. To be clear, I expect that you would not put that much effort into writing something that clearly and frankly expresses your frustration.

2) Did you forget about this statement in the opening paragraph of the essay?

> To justify this emotional outbreak potentially insulting to unknown persons, I will write a lot of text. Those not comfortable with toxic language should pretend this is a religious text.

The tone is deliberate and intentional. Please adjust your worldview to include the existence of people who get angry about stupid bullshit and then write and publish in-depth, angry essays about exactly how stupid that bullshit is.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: