OSX’s Dwindling Support for Third-World Languages

Guest post by Eisel Mazard 2012-06-14 OS News 43 Comments

The average computer user might think that the number of languages their operating system supports is pretty long. OSX supports 22 languages, and Microsoft claims to support 96, but they’re counting different regional dialects multiple times. But there are over 6000 languages, and though many of them are spoken by a dwindling few, there are some languages that are spoken by millions of people that are supported very poorly, if at all, by computer operating systems. The reason for the support being poor is that the people who speak those languages are poor, and are not good “markets.” It’s only because of the efforts of a few dedicated people that computing support for languages such as Burmese, Sinhalese, Pali, Cambodian, and Lao have been as good as they are, but the trends for the future are not good.There was a time when I checked OSNews regularly because I was despairing at the impossibility of computing in various languages on a Mac. During the first five years of OSX, there was an open question of whether or not it would maintain its former advantage in language software or if open-source solutions (like Ubuntu) would gradually overtake it.

I’m not digressing to explain what the technical difficulties are because (in my experience) the vast majority of people who aren’t already engaged with these problems will be unwilling to read through an explanation of that kind. The general statement of disbelief I receive (when this comes up in conversation) is that a computer that can display Chinese should be able to display anything, i.e., because the brush-strokes of Chinese characters are so complicated (for the human hand and the human eye to deal with); however, the computer does not “see” language in the same way as a person does, nor does it “write” language in the manner of a human hand.

Counter-intuitive though it may be, it is much more technically complex for software to render a script with combining elements that need to be “drawn” in a slightly different way given the context of other letters surrounding a given glyph (and this is the case for Burmese, literary Sinhalese, etc.); the number of brushstrokes is irrelevant.

Inevitably, the quantitative difference in the number of millions of users for a given language entails a qualitative difference in what you can (and can’t) do (on a computer-screen) in a given language. We live in a world where there is more support for some video-games than many languages; and, indeed, there is sometimes more software developed for entirely fictional languages (originating in science-fiction novels, etc.) than there is support for languages of real historical and political importance.

During the last decade, my own computing (in Burmese, Sinhalese, Pali, Cambodian, Lao, and a number of other languages) depended almost entirely on the work of just one man: the software developer Ka’Ånohi Kai at the less-than-famous firm Xenotypetech.

I corresponded with Ka’Ånohi over a period of many years: there were several periods during which I wrote to him daily, sending in detailed error-reports, and suggestions for improvement, when the fonts were under development. Of course, the Unicode standard itself was still “under development” to some extent; and the Mac OS was much less stable than everyone wanted to pretend it was.

Of course, this doesn’t mean that Linux had a better solution at the time: I remember explaining technical difficulties with several languages to a Linux programmer, and he responded by dismissing what I said with a wave of his hand, insisting that “minor problems” of that kind would be (gradually and invisibly) resolved through programmers contributing fixes over the next five or ten years. The problem is that language computing can’t work that way: I wasn’t in a position to delay my own work by five or ten years, and nobody can publish a book with incorrect vowel-markings (or other “minor problems” that make the text incorrect and illegible) due to systemic errors that are beyond their control, and may get worse before they get better through gradual contributions of partial solutions (by unpaid volunteers).

Unlike contributions to open source projects of other kinds, language computing tends to benefit from (dare I say it?) planning, with clear deadlines, and professionally set standards and targets and (just imagine!) salaries being paid to the contributors doing the hard work. They also have to deal with error reports… and no, you can’t really expect those error reports to be delivered in English. In fact, for many languages, you need to actively go out in the field and solicit error reports from fluent speakers verbally (showing them examples of print on paper, and figuring out what’s not quite right with it) because too few of them will be online, or testing features themselves on computers. In every language, you also have a generation gap: teenagers may be delighted that they can type out the characters at all, whereas older people (not using computers) may be aware of problems that only arise in the literary form of a language (but not in simple vernacular) or that teenagers simply don’t care about.

The projects that eventually made it possible to compute in Cambodian on Linux did not resemble the idealized open-source methodology at all: they were bankrolled by government and U.N. donors, and operated not so differently from a government bureau themselves: they had desks, they had an office, they had a payroll, and, like institutions generally do, they took a long time to get results, but they did get results. Economically, the charity sector is dominated by Christian missionaries, and many other projects for languages of the poor have been taken on by (infamous) missionary groups like S.I.L. (about whom many moral indictments have already been written, such as the 1995 book Thy Will be Done, etc.). I was surprised to find that the government of China was actively collaborating with S.I.L. to produce language materials (and software!) in Yunnan; at the time I saw those materials, they were of abysmally poor quality, but hey, where there’s a paycheck, there’s a way (and if their funding truly has no limits to its patience, that work may get done eventually, too… or it may not).

My point here is limited: whether the work gets done by bible-vendors or by atheistic career bureaucrats, the institutional pattern here looks nothing like the open source model (even if the results are made freely available to all, etc.) –and this is itself part of why alternatives to Mac took such a long time to catch up with (pre-Unicode) fonts on OS 9 (yeah, I said it). There were indeed people who kept on using “legacy encodings” for the better part of that decade, for that same reason (and, yes, S.I.L. was among the agencies to produce conversion utilities, from legacy to Unicode, for languages that had no commercial incentive for software developers, etc. etc.).

My correspondence with Ka’Ånohi often dealt with minute questions of how exactly two glyphs should combine, or how a certain curl-of-the-pen in a character wasn’t quite the right shape. I described all of this verbally as I actually had no access to a scanner in those years. I was sometimes sending in these reports from monasteries in Sri Lanka, and, at other times, from a garret in Vientiane, and so on. Through a thousand tiny revisions, I became involved with the making of these typefaces; but all of the hard work was done by Ka’Ånohi himself, in Hawaii.

At some point in this long correspondence, Ka’Ånohi must have lost his objectivity because I convinced him, against his best interests, of the philological and pedagogical value of creating a font that could properly display the AÅ›okan-era inscriptions. With the full awareness that this product would have a total market of roughly ten scholars, Ka’Ånohi undertook the long hours of labor to make this possible, simply as a favor to me (or, perhaps, to reward whatever efforts he thought I had made in obsessing over revisions to so many other fonts, from modern Lao to medieval Khom).

The results remained imperfect for a rather sad reason, beyond anyone’s control: the technical support from Mac was wildly inconsistent from one version of the operating system to the next. During those years, Ka’Ånohi would scramble to “fix” all of his fonts every time Apple updated their software –although there had been nothing broken about them before.

This was exasperating to people all over the world: around the year 2001, the Mac had an enormous advantage over Windows in rendering the languages of the poor (just ask anyone who was typing in Tibetan in the 1990s; the difference was even more dramatic for languages that had zero commercial incentive for software developers in that era, like Khmer and Shan); however, in the space of just a few years of corporate indifference, this advantage was eroded, along with the patience of many people who had come to rely on the Mac.

Despite the uniformity that the Unicode standard was supposed to create, many people found that they had to re-type documents on their hard drives that had become incoherent (in updating from one version of O.S. 10 to the next) through no fault of their own. As the operating system became more sophisticated, its ability to deal with “marginal” languages actually decreased, and AÅ›okan script specifically went from difficult to impossible to deal with.

We never met in person, and I knew that it would be impossible for us to ever do so after he renounced his American citizenship in support of Hawaii’s native sovereignty. I now haven’t heard from him in several years, and I haven’t found any recent updates from him posted on the internet; my thanks now goes out to him without expecting to hear anything back. I was hoping to hear more from him after I switched from Southeast Asian languages to working on a language indigenous to “the new world” (scil. the Algonquian language Cree) but he seems to have disappeared.

To close with one short anecdote about the extent to which this problem goes unknown, unnoticed and unresolved: I remember talking to a technician at a Mac store when the OS made the transition from 10.2 to 10.3. I told him that I wanted to see the list of languages that were supported with the new version, and he flatly insisted that every language was already supported by the system. I then asked specifically about five different languages (each with tens of millions of speakers, not small or obscure languages, but, simply, languages spoken by the poor) and he was astounded to find that Mac could handle none of them (i.e., not even if you bought your own font, etc.). The basic attitude cultivated by players of video games is that a computer that can display Chinese, Japanese and Korean can (and does) display “everything”; however, the world is much larger than the nations that produce comic books that are read by computer programmers.

In 2012, computing in languages other than English remains at a very uncertain state: Mac’s OSX is now in its creaking (error-ridden) final years, and nobody knows what will replace it. Ubuntu is becoming a robust alternative, but if you actually want to publish a book in a given language, amateur-level support for computing in that language normally isn’t good enough (you can’t display combinations of letters in a manner that’s only approximately right). My own faint hope is that Google corporation’s longstanding interest in translation (currently directed toward online applications) will spin off (directly or indirectly) some kind of OS that is fundamentally polyglot.

It’s a faint hope: in many languages, the easiest way to communicate on a computer remains writing out your message in longhand, scanning the page, and then e-mailing the image as an attachment. This is still the method used by Cambodia’s (retired) kind Sihanouk to make his statements to the public; and nobody would be in a position to tell him that he has really missed out on anything by not learning to type in the long sequence of barely-workable systems that have come and gone on computers during his lifetime.

43 Comments

2012-06-14 10:40 pm

tidux
Younger speakers of these languages may decide to just communicate in English when using computers rather than enduring the frustration of trying to get their native tongues working on their systems. As more and more of humanity’s interactions take place via computers, this marginalizes the “incompatible” languages. In the same way that English-only schools in the US nearly wiped out many Native American languages, the Internet is going to put the squeeze on the Third World. Who knows, we may get a global standard language this century after all.

2012-06-15 12:23 pm

zima
Thing is, for a longish period in the meantime, this marginalizes many, many “incompatible” people. Deprives them of a starting point, a foothold which enables or at least greatly eases going further. Excludes from, hampers overall advancement.

And so the digital divide could contribute to maintaining or even enlarging other divides.

Look at those three scripts for example – imagine how you would feel when seated in front of a computer using exclusively one of them, how it would impact you eagerness to explore more the possibilities of it.

http://en.wikipedia.org/wiki/File:Greekalphabet.svg

http://en.wikipedia.org/wiki/File:Romanian_Cyrillic_-_Lord‘s_Prayer_text.svg

http://en.wikipedia.org/wiki/File:Georgian_Alphabet_Georgia_Sample….

(even better, browse a bit http://el.wikipedia.org/ or http://bg.wikipedia.org/ or http://ka.wikipedia.org/ )

…and those are two official scripts of the EU, and the third one used in more or less European place, mostly still relatively close culturally to yours.

2012-06-15 9:45 pm

tidux
That’s a stupid example. Computers have worked properly with Greek and Cyrillic since what, the 80s?

2012-06-15 11:03 pm

zima
I see, you managed to miss how that wasn’t an example, but an analogy, and not about computers ability to display them…

And I have perhaps even more telling one: Blackletter (Gothic) script, used in Germany well into XX century (I have some books in it after grandfather), just more or less Latin alphabet – but hardly readable to somebody not used to it.

Sure, it easily works properly with computers …which doesn’t change its relative ineligibility, its potential alienating qualities (if somebody would, say, switch just the fonts – not even the language – to Blackletter at your computer)

Edited 2012-06-15 23:05 UTC

2012-06-16 2:27 pm

tidux
Blackletter script doesn’t work well on computers because the font sizes would need to be enormous to be legible at current DPI. Maybe this new trend of >200 DPI screens will fix that.
2012-06-16 2:55 pm

zima
Blackletter doesn’t work well, isn’t very legible anywhere, also in print – why it was abandoned, and why I mentioned it as one more analogy to experience/imagine possible exclusionary qualities of alien scripts (like Latin ones are for tons of people)

Edited 2012-06-16 15:06 UTC
2012-06-18 2:57 am

mrstep
Well you were the one calling it an example in your previous post.

In any case, I’m sure there are many fantastic reasons to keep around all of the thousands of languages that were spawned because of the inability of people to cover meaningful geographic distances quickly, but there are probably more reasons supporting people being able to understand each other instead.

For “analogy” (I kid…), if you post in some Khmer dialect here, you’re unlikely to get many responses. It’s not only not much of a market for OS vendors, it’s limiting in terms of your own economic opportunities. So it goes.
2012-06-18 10:22 am

zima
That was more a common EN figure of speech (used by somebody for whom it’s a 3rd language, so less flexibility); or, at most, an example by analogy – not a simple direct one, like it was (a bit rudely) dismissed by the other poster.

Anyway, yes, efforts to improve communication and mutual understanding between people are great – but the point is, the initial tools meant to facilitate them better be in the language and script those people are fluent in. Otherwise, the results might be mixed…

(an example: once, for less than two months, I shared the dorm room with somebody who came from a far – and of course, to study, he should first learn the local language; there were no lessons in his native one, so he opted – before coming here, I presume – to attend Polish lessons for EN speakers; the thing is, he basically only pretended he has a grasp of EN in the first place; suffice to say… two months)

BTW, I think “spawned because of the inability of people to cover meaningful geographic distances quickly” does not cover it fully. Not only it’s easy to find places where closely neighbouring (even “packed” on quite small area) languages and scripts aren’t mutually intelligible. But also, for centuries at my place, it wasn’t even “neighbouring” but more like “intertwined” – people tend to self-segregate, it seems.

2012-06-14 11:05 pm

kragil
_Every_ country has a few rich people or a governemnt who could fund better support for their language in Linux. Talk to them or just plan a kickstarter funding.

“It is not your fault that the world is like it is, it only your fault if it stays that way.”
2012-06-14 11:36 pm

KLU9
the world is much larger than the nations that produce comic books that are read by computer programmers.

Oh no you di”n’t!

But seriously, fascinating article. Although I do feel it missed an opportunity, namely: why was language support so much better on older Mac OSes than now?

The implication in the article is that it isn’t commercially interesting to Apple to support them. But surely living languages like Burmese or Khmer are more commercially viable now than in the 1980s, not less.

Maybe there’s some greying ex-Apple employee who could be tracked down to offer the straight dope. Was the greater support back then the result of financial support? Some technical issue? A senior employee who just had a love of languages?

Fingers crossed for such a follow-up

2012-06-15 11:50 am

zima
But surely Apple isn’t the same company as back then…

They were pretty much the first with thorough and sane support for Polish alphabet (versus http://en.wikipedia.org/wiki/Mazovia_encoding simply exchanging similarly looking letters in firmwares, in original DOS code page, Å instead of Â£ for example; to be fair, pretty much the only approach without access to source code; contributed to rarity of PL keyboards – we mostly just use US keyboard, right Alt as AltGr in combination with original Latin letter for diacritics).

And afterwards also with the first, IIRC, properly localised OS – when not only Macintoshes were prohibitively expensive (relatively, here; think in the range of annual salary), severely limiting their potential market, but also possibly still under CoCom embargo (as all 32-bit CPUs, I believe), at least formally. Also pretty much the only computers equipped often with PL keyboards.

That was of course relatively simple, compared to the issues from the article – “just” adding diacritics to few Latin letters, and quite straightforward translation into similarly structured script (mostly by some members of the relatively large Polish diaspora, I guess).

Still – yeah, why? Wishing to quickly take over DTP in then-emerging markets? (or education some time later, where proper localisation and keyboards were undoubtedly desirable; though there was possibly more behind that choice… http://www.osnews.com/thread?489120 – from 3rd section, “Furthermore”)

Maybe also to accommodate the needs of diaspora?

Anyway, now they are a company which openly states their aversion to target “lesser” poor people, aims for the “premium” ones…
2012-06-16 5:28 pm

Jaktar
“why was language support so much better on older Mac OSes than now”

Profiteering. As Thom already said, it takes paid workers to do all this stuff. In an effort to make another dollar, Apple is allowing this to happen. Their mantra of “Our way or the highway” pretty much sums it up. They already know what you want and they are giving it to you.
2012-06-18 3:13 am

mrstep
Why was the OS X support better? Well didn’t you hear: “Mac’s OSX is now in its creaking (error-ridden) final years”.

Huh? It’s significantly more stable and well tuned for multi-core/multi-processor work thanks to the great Snow Leopard release. Lion not so much… But there are a whole lot more creaking, error-ridden languages out there, and it sounds like some of them may be dead before OSX.

Anyway, I agree, the more relevant question would have been why they have been dropped from OS X and whether these areas can somehow promote the inclusion of their languages in OS releases. Given the likely lack of economic incentives and presumably (at least in the case of OS X which is targeted to Apple hardware) lower user base thanks to at least marginally higher cost, it’s maybe more surprising the support for these languages has lasted as long as it did. There’s a fair bit of extra overhead to keep those languages up to date, not to mention support for more esoteric character handling.

Very cool to see a mention of Hawaiian self-determination though. The U.S. history of treatment of the Philippines, Hawaii, etc. are truly disturbing, especially given how little of it is explained in the U.S. education system. Of course the Japanese attacked the U.S. at Pearl Harbor, but… Hey, that wasn’t actually part of America. How did a U.S. naval base end up there? How about in the Philippines? Fun stuff – so much dirty history that gets glossed over if mentioned at all.

2012-06-15 12:37 am

jburnett
Companies would support more languages if there was a profit in doing so. Therefore, either there is little to no demand or the cost to supply the demand is too high. If there are millions of customers who would like to have the service, it must be that the service is too expensive to deliver. Is this the case? Is it just that it costs too much engineer time to make a profit?

Or, is it because these languages are very difficult/different to describe in binary? A fraction of a millisecond can make all the difference for something as fundamental (and repeatedly called) as the font renderer. If adding support for difficult languages means degrading the performance, even if it just makes the system “feel” a tiny bit slower, then it makes sense to drop the language. After all, if there is one thing us comic book loving gamers love more than, well, comic books, it is performance.

2012-06-15 2:14 am

Soulbender
Or, is it because these languages are very difficult/different to describe in binary?

Describe in binary?

even if it just makes the system “feel” a tiny bit slower, then it makes sense to drop the language.

So we should drop all natural languages from computing then? Don’t be silly, the impact on users not using those languages would be negligible.

[q]After all, if there is one thing us comic book loving gamers love more than, well, comic books, it is performance.[q/]

There’s a reason no-one takes comic book loving gamers seriously, especially when it comes to computing.

2012-06-15 2:38 am

jburnett

Or, is it because these languages are very difficult/different to describe in binary?

Describe in binary?

Computers only deal with binary. Everything else is an abstraction. Some things lend themselves to binary representation. Take the Latin alphabet for example. It has 26 letters (52 with upper/lower) and 10 digits. It can be described with a binary string of 6 bits, 7 if you want all the extra punctuation, 8 if you want all the symbols.

Alternatively, Chinese has a much larger alphabet, but as far as I know the characters are always rendered the same. So character no. 77 will always be rendered the same way.

The article said that in some of these other languages things cannot be described as easily. It implied that the way to render one character was based on the other characters around it.

Personally, I find this concept fascinating. I had never considered that the way I visually represent a sound/concept might be influenced by other concepts/sounds around it. Human creativity never ceases to amaze me.

That does not mean it would be easy to map such a system to an array of characters. This may not be the problem. Thus why my comment was titled “what is the problem?”

However, it does play into the next point.

even if it just makes the system “feel” a tiny bit slower, then it makes sense to drop the language.

So we should drop all natural languages from computing then? Don’t be silly, the impact on users not using those languages would be negligible.

No, don’t drop them from computing, just from the primary font rendering system. In computer graphics, negligible adds up quickly. You have to do a lot of calculations in a very small amount of time. Delay is perceived as slow or unresponsive. Even something as quick as a check to see which font rendering system to use can be expensive when done a lot.

After all, if there is one thing us comic book loving gamers love more than, well, comic books, it is performance.

There’s a reason no-one takes comic book loving gamers seriously, especially when it comes to computing.

Haha, if this was true, then this article would be talking about the great new font rendering system that handles some even more creative language. Instead, a large segment of the computer industry is driven by video games.

2012-06-15 12:09 pm

sorpigal
Computers only deal with binary. Everything else is an abstraction. Some things lend themselves to binary representation. Take the Latin alphabet for example. It has 26 letters (52 with upper/lower) and 10 digits. It can be described with a binary string of 6 bits, 7 if you want all the extra punctuation, 8 if you want all the symbols.

Your understanding is simplistic. Try representing all cursive script in 255 bytes. Quiz: How many different ways are there to write “g”? What about “q”? Do you realize that the answer for “q” will be *at least eight*?

Alternatively, Chinese has a much larger alphabet, but as far as I know the characters are always rendered the same. So character no. 77 will always be rendered the same way.

That depends highly on your definition of “the same.”

Personally, I find this concept fascinating. I had never considered that the way I visually represent a sound/concept might be influenced by other concepts/sounds around it. Human creativity never ceases to amaze me.

Do they still teach handwriting?

Write the following words in english long hand:

grotesque

Grotesque

Quiche

Petunia

How many *distinct* glyphs do you see?

2012-06-15 2:33 pm

jburnett
Your understanding is simplistic. Try representing all cursive script in 255 bytes. Quiz: How many different ways are there to write “g”? What about “q”? Do you realize that the answer for “q” will be *at least eight*?

Yes, but you can use any of those representations for ‘q’ and people will know it is the same letter. The original post made it sound like slightly altering the rendering changed the meaning. Thus, when the letter was changed by a new OS rendering engine, it was “unreadable.”

Do they still teach handwriting?

Write the following words in english long hand:

grotesque

Grotesque

Quiche

Petunia

How many *distinct* glyphs do you see?

I learned handwriting in two forms, print and cursive. Print I still use heavily, though I blend it with cursive a bit when scribbling notes really fast. Do I expect some vendor to support my personal script, no. Do they support a language almost identical and fully readable, yes. Heck, I don’t even like my script, I just cannot hand write as cleanly and quickly as a computer can render.

This discussion isn’t about writing words to look pretty or conform to some sense of artistic style. It is about being able to render a language so that it can be written/read by somebody who knows the language.
2012-06-15 2:43 pm

sorpigal
Yes, but you can use any of those representations for ‘q’ and people will know it is the same letter.

This happens to work in English.

The original post made it sound like slightly altering the rendering changed the meaning.

Exactly, in some languages it does. Imagine if software were to slightly alter each ‘e’ into ‘c’, which after all looks quite close. Imagine the confusion if this sort of error were common.

My attempt was to get you to understand by using a familiar analogy. Consider the difficulty of describing how to form the various glyphs, and the large number of such glyphs, needed for English cursive writing.

Do I expect some vendor to support my personal script, no. Do they support a language almost identical and fully readable, yes. Heck, I don’t even like my script, I just cannot hand write as cleanly and quickly as a computer can render.

Your personal way of rendering the letters when you write, your style if you will, is not what I’m talking about. I’m talking about the glyphs you use, or anyone uses, when writing cursive script. In order for a computer to represent cursive script it must know about all of the variations that we all use as a natural part of cursive writing. This is important as an aid to your understanding of the problem: Some languages do not have a non-cursive form and may in fact load grammatically critical information in to the bits and pieces between letters.

This discussion isn’t about writing words to look pretty or conform to some sense of artistic style. It is about being able to render a language so that it can be written/read by somebody who knows the language.

Yes it is, and I kept my comments firmly on that footing. I am not talking about stylistic variations, although it should be noted that these ought to be supported. If that’s what you took from my comments you’re harder to reach than I thought.

Edited 2012-06-15 14:45 UTC
2012-06-15 8:23 pm

westlake
This discussion isn’t about writing words to look pretty or conform to some sense of artistic style. It is about being able to render a language so that it can be written/read by somebody who knows the language.

But someone who knows the language and culture will care about appearance and style.

That is, after all, what made the Mac the platform of choice for what would become known as desktop publishing.

2012-06-15 2:32 am

NathanHill
This was an unusual article and something I hadn’t noticed before. I tend to get rid of all languages but English and Korean (and maybe Spanish) on my Macs. It seems like the list is longer than 22, but I have to admit I don’t pay that much attention. In the past when I tried to render Korean in Linux, I had absolutely no luck. It just didn’t work after installing what I thought I was supposed to install.

Anyway, thanks for writing this different perspective.
2012-06-15 2:38 am

ozonehole
If all you want is write a language phonetically correct, there is no problem. You can use the International Phonetic Alphabet (IPA):

http://en.wikipedia.org/wiki/International_Phonetic_Alphabet

You can write English with the IPA, and ditto for Russian, Chinese, Japanese, Tibetan, indeed every single language. It is far more phonetically accurate than the Roman alphabet, or Cyrillic, Hebrew, Arabic, etc.

Of course, it is not traditional, and thus may offend the sensibilities of those who think that their traditions are being trampled by modern society. Another disadvantage is that only a small percentage of the world’s population has even heard of the IPA, much less knows how to read it. Yet it is not difficult to learn. Study it for a few days or a week, and you’ve got it.

The vast majority of the world’s 6000 or so languages have no written script, so the IPA is ideal for those. However, Christian missionaries and language committees from the UN and elsewhere do not seem to be interested in spreading the IPA – they push Romanization if no ready traditional script is available.

The main problem with Romanization is that with only 26 letters available, it cannot represent every sound in every language. Indeed, the Roman alphabet isn’t well suited to English, because it only has five vowels. Thus, we are told in elementary school that English has five “short vowels” and five “long vowels” for a total of 10. In fact, there are 10 vowels in spoken English, and there is no such thing as a “short” or “long” vowel – it’s a band-aid approach to the problem that Latin had five spoken vowels and English has 10. In the IPA, the 10 vowels have 10 different symbols.

I understand that people like their traditions, but I don’t actually see much value in creating fonts for a traditional script that only 10 people in the world can read. If somebody wants to volunteer to do the work, then great, but don’t be surprised if software developers don’t jump in to enthusiastically support such efforts. Also don’t be surprised if Third World governments don’t come up with the funding for this – their scarce funds can probably be put to better use.

Edited 2012-06-15 02:45 UTC

2012-06-15 8:16 am

jalnl
Please, please, please, don’t go ranting on a topic you seem to know very little about. So many errors in there, I don’t have the time to point them out (ok, one: English has far more than 10 vowels).
2012-06-15 11:01 am

Radio
Unlike metrication, any reform in spelling should preferably take place over a long period of time in order to prevent confusion (freight=frate; eight=ate?). It should also be completely coherent, and the invention of new letters (vide the pseudo-Icelandic known as ITA) or the assumption of many diacritical marks, such as bespatter the pages of modern Slavonic texts, should, so far as possible, be avoided.

It was suggested â€” by, among others, G. B. Shaw â€” that a convenient method of revision would involve the alteration or deletion of one letter, or associated group of letters, per year, thus giving the populace time to absorb the change.

For example, in Year 1, that useless letter ‘c’ would be dropped to be replased by either ‘k’ or ‘s’, and likewise ‘x’ would no longer be part of the alphabet. The only kase in which ‘c’ would be retained would be in the ‘ch’ formation, which will be dealt with later. Year 2 might well reform ‘w’ spelling, so that ‘which’ and ‘one’ would take the same konsonant, wile Year 3 might well abolish ‘y’, replasing it with ‘i’, and Iear 4 might fiks the ‘g/j’ anomali wonse and for all.

Jeneralli, then, the improvement would kontinue iear bai iear, with Iear 5 doing awai with useless double konsonants, and Iears 6-12 or so modifaiing the vowlz and the rimeining voist and unvoist konsonants. Bai Ier 15 or sou, it wud fainali be posible tu meik ius ov thi ridandant letez ‘c’, ‘y’ and ‘x’ â€” bai now jast a memori in the maindz ov ould doderez â€” tu riplais ‘ch’, ‘sh’ and ‘th’ rispektivli.

Fainali, xen, aafte sam 20 iers of orxogrefkl riform, wi wud hev a lojikl, kohirnt speling in ius xrewawt xe Ingliy-spiking werld. Haweve, sins xe Wely, xe Airiy, and xe Skots du not spik Ingliy, xei wud hev to hev a speling siutd tu xer oun lengwij. Xei kud, haweve, orlweiz lern Ingliy az a sekond lengwij at skuul!

2012-06-15 12:15 pm

sorpigal
While this anecdote is funny, it is not a seriously useful suggestion. I recommend that anyone interested in English spelling reform consult his friendly, neighborhood Google and do some research. It’s a fascinating topic with many possibilities but no real probabilities.

My personal conclusion is that you can’t fix English orthography without making the result practically a different language and, even if you could, you won’t get buy-in from enough people to do it by fiat. It must be an extremely slow iterative process prosecuted by a growing pool of interested individuals across a timeframe of generations, by which I mean that if we started today I think a majority of speakers could be using what is effectively a fully reformed system in 200 years, but it will be necessary for most people to remain familiar with current conventions for at least twice as long as that.

2012-06-19 4:15 pm

zima
Curious thing about that anecdote – if one reads it in a ~Latin way (as at least most European ~Latin script languages seem to be pronounced), even by somebody who doesn’t know English, the result is quite… bearable. Certainly much more understandable than when the original EN orthography is read like that.

Well, OK, maybe it breaks down a bit in the last section

2012-06-15 12:25 pm

sorpigal
Depending on who you ask English has between 41 and 47 distinct sounds. It is easily possible to represent this set within the confines of the glyphs from the English alphabet (which, I will remind you, is not the same as the Latin one!) Many proposals have been made for how this might be accomplished.

IPA is not a practical answer to written communication. It is concerned with how things sound, not what they mean. Forcing pronunciation in to the script is a bad idea from a practicality standpoint and just doesn’t work long-term, as far as we can tell. You say tÉ™ËˆmeÉªtoÊŠ, I say tÉ™ËˆmeÉªtÉ™, but we both read “tomato” and this is good.

Incidentally, here’s my current pet idea for overloading 26 letters: exploit a convention people already know and treat “h” (and only “h”) as special. Any character followed by h assumes an alternate pronunciation and otherwise is always pronounced the same way. Thus your short A can be “a” and your long A can be “ah,” just as “t” is distinct from “th.” “Bat” would remain the same but “father” would change slightly to “fahdher.”. It’s still necessary to add more vowel characters, of course.

2012-06-15 3:01 am

redshift
What is the percentage of Mac usage in third world countries? I am thinking it would be pretty low regardless of language support.
2012-06-15 5:56 am

Wodenhelm
Even when English gets various forms of dialect support, we’ll never, EVER see support for the Appalachian dialect; despite there being millions of speakers. I think that social prejudice is as much of an issue as “poor markets”.

2012-06-15 6:33 am

Morgan
Speaking as a native of the north Georgia mountains, which Appalachian dialect are you referring to? There are a couple of distinct ones just in my neck of the woods, and the further northeast you go the more you hear. My biological father was from the North Carolina mountains and his family’s dialect was vastly different from what people speak here.

2012-06-15 7:40 am

drstorm
WALL OF TEXT
2012-06-15 8:17 am

jalnl
Very interesting article, but it only talks about Linux and Mac OS/OS X. What about Windows?
2012-06-15 3:57 pm

MattPie
Just learn English, not a big deal.

Sent from my afterlife iPhone

(that’s a joke, friends)
2012-06-15 4:56 pm

kaiwai
From what it appears the issue sounds more like one related to script support rather than language in and of itself – maybe the solution is to change the script. Recognise the the script was designed for an era of bamboo calligraphy and necessity demands a simpler, cleaner and more straight forward script without all the elaborate bullshit that exists today.

As for niche languages – I know in the case of New Zealand the government worked with Microsoft to get Maori supported on Microsoft Windows. Although it wasn’t necessarily a simple operation it was made a heck of a lot easier by relying on the roman alphabet with a few modifications. It can be done, the question is whether there is the will power to do so and willingness to compromise when it comes to maybe designing an alphabet that is easier to represent on the computer.
2012-06-15 6:19 pm

Pro-Competition
Thank you for this very interesting article!

I’m not digressing to explain what the technical difficulties are because (in my experience) the vast majority of people who aren’t already engaged with these problems will be unwilling to read through an explanation of that kind.

Actually, this may not be true. I think this is a site where some of the readers would be interested in the technical details (including myself).

This is a subject that many of us (again including myself) have almost no knowledge of, even if we are interested, for the reasons you mentioned in the article.

Personally, I am very interested in preserving languages. This is a perfect example of a case where FOSS should shine – because there is very little financial incentive for commercial entities to support these languages.

Is there a (free) global information source on written languages? Or any promising projects that have begun work on this?
2012-06-15 7:02 pm

spiderman
I’m willing to help improve support for those languages.

Could you please provide some pointers to what have been done until now (you said there was incomplete support)?

Are there some (image) documents that explain the specifications of the language?

Please tell me what I can do to help.
2012-06-15 9:20 pm

kloty
just think about voice recognition systems like Siri. How many languages will be supported by such systems? I wrote an article on this topic http://technokloty.blogspot.de/2011/10/how-speech-recognition-endan…
2012-06-16 12:24 am

benrett
Home computer ownership, even an old, well out of date desktop PCs is completely unaffordable, rare and out of reach for the majority of people in developing countries.

While there is a big and successful trade in offloading used and out of date mobile phones and motor vehicles from developed to developing countries. This does not seem to be the case for old computer equipment I am not sure why. Maybe it’s not financially viable to ship the bulky equipment while keeping it in working order?
2012-06-16 10:42 am

steve_s
For an article that’s purporting to talk about Mac OS X’s “dwindling” support for different languages, I had expected to see some specific discussion about how the number of languages OS X supports has reduced. If that was in this article, I missed it.

I found just a single specific statement that says that Mac OS X supports 22 languages. That is incorrect. Mac OS X Lion and Mountain Lion both include translations for 30 languages. Technically, Mac OS X supports many more, since it lets users pick from about 140 different languages. For example an OS X user can set their preferred language to Tagalog and, whilst they won’t see the OS itself in that language, if they run software that has included a Tagalog translation then that’s what they’ll see.

Mac OS X has supported Unicode encoding since it’s inception, and is perfectly capable of rendering text encoded as utf-8 and utf-16, as well as supporting fonts including the full range of unicode glyphs. Indeed, the OS’s font rendering system is smart enough to go looking for glyphs in different fonts should the currently selected font not include a glyph.

An additional major problem in dealing with alternate languages is support for input sources to match up with differing languages. This is a subject that the article doesn’t discuss at all. Mac OS X includes support for dozens of different input methods, and this is user extensible.

From where I’m sitting, it looks like Mac OS X has excellent support for languages.

Anecdotal rants about how “many people found that they had to re-type documents” gives us no clue as to the truth of the situation. Were these people using unicode, or were they using older legacy encodings that had since got dropped? Why were their old documents rendered illegible? I have no idea.

I’m not saying the author is wrong, but if it’s really the case that Mac OS X has stopped supporting some languages then it would have been useful to provide some clear examples saying “in Mac OS X 10.4, language X was supported, but was removed in 10.5”. Instead this article seemed to be less about Mac OS X and more about attempting to encourage Ubuntu to improve it’s language support.
2012-06-17 3:51 am

unclefester
The reality is that the vast majority of people who use a rare language can also use a reasonably popular language eg every adult who speaks Scots Gaelic also speaks fluent English.

Around 80% of people in the world speak either Arabic, Spanish, Hindi-Urdu, English, Mandarin or Cantonese as their primary lanuage.

2012-06-17 12:34 pm

spiderman
But that is because Scotland is an English colony, isn’t it? Not all regions are English colonies.

2012-06-17 11:29 pm

unclefester
Scotland has never been an English colony. The majority of Scots supported England during the 1745 Rebellion and were in favour of forming the United kingdom.

Don’t take your history from Braveheart(Mel Gibson hates the English). William Wallace was a French speaking Norman nobleman not a poor Scottish farmer.

Most Scots (like my ancestors) are of Scandinavian heritage – not Celts. The common language of Scotland has been Lallans (a form of Anglo-Saxon for almost 1500 years.

2012-06-19 4:17 pm

zima
So… he was of Scandinavian heritage too, right?

PS. Is Highlander better?

Edited 2012-06-19 16:18 UTC