Man, Unicode is one of those things that is both brilliant and absolutely absurd. There is so much complexity to language and making one system to rule them all ends up involving so many compromises. Unicode has metadata for each character and algorithms dealing with normalization and capitalization and sorting. With human language being as varied as it is, these algorithms can have really wacky results. Another good article on it is
Man, Unicode is one of those things that is both brilliant and absolutely absurd. There is so much complexity to language and making one system to rule them all ends up involving so many compromises. Unicode has metadata for each character and algorithms dealing with normalization and capitalization and sorting. With human language being as varied as it is, these algorithms can have really wacky results. Another good article on it is
And if you want to RENDER text, oh boy. Look at this:
Oh no, we've been hacked! Theres chinese character in the event log! Or was it just unicode?
The entire video is worth watching, the history of "Plain text" from the beginning of computing.