The first time I learned about UTF-8 encoding, I was fascinated by how well-thought and brilliantly it was designed to represent millions of characters from different languages and scripts, and still be backward compatible with ASCII.
[…]Designing a system that scales to millions of characters and still be compatible with the old systems that use just 128 characters is a brilliant design.
↫ Vishnu Haridas
On a slightly related note, if you are ever bothered or annoyed by text online rendering as unknown squares, you most likely are just missing the proper fonts to render them. At least on most Linux and BSD systems, all you need to do is install the entire set of Noto fonts, including those for every single non-Latin script. Assuming your package manager has sane naming conventions, it’ll most likely come down to something like sudo dnf install google-noto*
or whatever your system’s install package command is, and after installing a whole slew of font files, your system will now be able to render virtually every script under the sun.
After installing this massive font set, you can do things like write and render in hieroglyphics, write Ea-nāṣir‘s name the way it’s supposed to, and render all kinds of other scripts and symbols without ever having to look at one of those blank squares ever again.
That’s helpful, since I’ve got this clay tablet of a one-star review but whoever wrote it just put a bunch of squares
It’s a happy coincidence that ASCII was originally designed to work on systems with 7 bit chars, a bit short of an eight bit byte.
I wasn’t really around at the time but I think the reason ASCII targeted 7 bits was to be compatible with data networks that only offered 7 data bits + 1 parity bit.
https://shubmehetre.com/posts/why-ascii-uses-7-bits/
It’s the same reason network encoding protocols like uuencode/base64 (think email) only use 7 bits.
It provides an rather obvious solution for extending the character set using that unused bit. It’s quite rare for things like this to happen but it’s nice that it did because otherwise it would have been necessary to break compatibility.
Alfman,
I would even go and say: Internet was a product of happy accidents.
If we were to “design” it, it would be a clunky, much less useful system. And more locked down than the Chinese version of TikTok.