Hacker News new | past | comments | ask | show | jobs | submit login
The absolute minimum you must know about Unicode and encodings (joelonsoftware.com)
15 points by halb 6 months ago | hide | past | favorite | 2 comments



(2003) Big in:

2012 (214 points, 75 comments) https://news.ycombinator.com/item?id=3448507

2014 (96 points, 37 comments) https://news.ycombinator.com/item?id=6996500

2010 (61 points, 21 comments) https://news.ycombinator.com/item?id=1219065

2017 (57 points, 11 comments) https://news.ycombinator.com/item?id=13908703


IMO one of the pedagogical issues is that people who start with ASCII often assume that the byte-representation (e.g. 0x48) is numerically the same as the code-point (48 in hex and/or 73 in decimal) and vice versa.

This leads to a mental model of:

    (bytes which are numbers) -> pictures
That breaks down when you get into UTF-8 which forces people to recognize more steps:

    bytes -> numbers -> pictures
And then when it comes to things like code-points that might have no visual representation themselves, but modify others, like accents.

    bytes -> numbers -> groups of numbers modifying each other -> pictures




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: