Nominally a book that covers the rough century between the invention of the telegraph in the 1840s and that of computing in the 1950s, The Chinese Typewriter is secretly a history of translation and empire, written language and modernity, misguided struggle and brutal intellectual defeat. The Chinese typewriter is ‘one of the most important and illustrative domains of Chinese techno-linguistic innovation in the 19th and 20th centuries … one of the most significant and misunderstood inventions in the history of modern information technology’, and ‘a historical lens of remarkable clarity through which to examine the social construction of technology, the technological construction of the social, and the fraught relationship between Chinese writing and global modernity’. It was where empires met.
What a delightful intriguing piece and food for thought. It highlights large cultural differences. Where the west went with roughly 26 inherently meaningless symbols and deriving meaning out of the combination of those symbols, the far east went with defining meaning through representing concepts with a unique attached character.
It’s curious to realise that we build representations of reality with only 26 building blocks and expanding the total canon of what we know by coming up with a new combined string of symbols selected out of those 26.
Yes, rather like RISC vs CISC.
That is a good metaphore and a good find.
Well, except for math…
And then emoji conquered the world…
Remarkably, Emoji came from Japan, a country which mixes ideographic and syllabic alphabets (kanji & kana).
Your comment gave me an idea: what about UTF-8? Specifically, the storage requirements for UTF-8 octets?
The Lord’s Prayer takes 338 octets in English (basic ASCII), 372 octets in simplified Chinese, 492 in modern Russian Cyrillic, and 702 octets in Greek, after stripping out extra spaces and verse numbers.
To express the same concepts, Chinese uses roughly 10% more storage than English to express broad speech. For a Unicode code-point that takes 3 or 4 octets in UTF-8 space, I’d say that’s a remarkably efficient computer transcription of written ideograms.
Greek is stuck with being multi-byte, alphabetic, case-based and inflected (verbs), along with liberal use of direct articles; that’s why it’s 208% the size of English in storage. Russian has no articles at all, so it’s 30% smaller than Greek, but it still has pervasive noun cases & verb inflections, so it’s 46% larger than English.
Oh, I think I just found my next project: how do a language’s characteristics affect its storage in UTF-8? Navajo, Hungarian, and Tibetan might be a good place to start.
That is one of the nerdiest things I have read in a while. God speed!
Edited 2018-08-11 07:10 UTC
This article re-enforced my perception that Chinese scripts are basically glorified Pictionary pages. If we Greeks were brave enough to get rid of Greek numerals (which are like Roman numerals but worse), the Chinese can get rid of their writing system. When a computer has to try and predict as much as possible, not as a convenience but to make typing possible, and a simple typewriter is impossible to make, something has gone wrong…
Edited 2018-08-10 21:39 UTC
or a computer-keyboard … they will soon vanish.
Chinese people just draw a character on the touchscreen of their phone – or start to draw it, and the KI suggest a few likely symbols and you pick one.
that is not limited to phones of course, but works with all kinds of touch-interfaces…
Sorry, is “typewriterable” now a measure of worth of a language? It’s much faster to read Chinese than other languages (assuming you learnt it), and because of the whole one character per word thing, you can fit a whole lot more information in the same space. So you read faster in both senses: words over time, and ideas conveyed over time.
I wonder if the rapid development of mechanical writing is partly due to the fact that it is cumbersome to deal with alphabet-like language in hand writing. If you have an inefficient written language, of course you’d want computers to do the hard stuff. If you have a written language optimized for hand writing, you’d be less inclined to want mechanical help.
Too many homophones? Sounds like they done goofed. Who’d thunk it – when you smurf your smurf, you get smurfed in the smurf. They’d better unsmurf their smurf if they don’t wish to be smurfed for smurf.
No, because languages don’t get designed. Not even well designed programming languages. Languages with too many syllables, making speech necessarily rapid and still taking a long time to say something simple is the other extreme.
Navajo is exactly that case. Oblig: https://xkcd.com/257/
Also: https://en.wikipedia.org/wiki/Navajo_language#Grammar
Which is why written Korean retained “Hanzi” for use in places like newspaper headlines in addition to their elegant “Hangul” system.
For those unfamiliar with it, the Korean “Hangul” writing system was designed at the behest of a Korean ruler to improve peasant literacy and, as such, is very elegant indeed. What appear to be characters represent syllables and each one is built by combining up to three “jamo” in an “initial consonant, vowel, final consonant” combination.
Essentially, the “jamo” are a phonetic alphabet which are combined into pseudo-characters, one per syllable, so it combines the best aspects of an alphabet and a syllabary.
(eg. Conciseness. They have somewhere around 35 to 40 jamo, which is probably the same range we’d wind up with if we solved our “English spelling is a mess because we don’t have enough symbols for all our vowel sounds” problem.)
Edited 2018-08-13 20:00 UTC
*facepalm*