posted by Eisel Mazard on Thu 14th Jun 2012 22:01 UTC
IconThe average computer user might think that the number of languages their operating system supports is pretty long. OSX supports 22 languages, and Microsoft claims to support 96, but they're counting different regional dialects multiple times. But there are over 6000 languages, and though many of them are spoken by a dwindling few, there are some languages that are spoken by millions of people that are supported very poorly, if at all, by computer operating systems. The reason for the support being poor is that the people who speak those languages are poor, and are not good "markets." It's only because of the efforts of a few dedicated people that computing support for languages such as Burmese, Sinhalese, Pali, Cambodian, and Lao have been as good as they are, but the trends for the future are not good.

There was a time when I checked OSNews regularly because I was despairing at the impossibility of computing in various languages on a Mac. During the first five years of OSX, there was an open question of whether or not it would maintain its former advantage in language software or if open-source solutions (like Ubuntu) would gradually overtake it.

Languages

I'm not digressing to explain what the technical difficulties are because (in my experience) the vast majority of people who aren't already engaged with these problems will be unwilling to read through an explanation of that kind. The general statement of disbelief I receive (when this comes up in conversation) is that a computer that can display Chinese should be able to display anything, i.e., because the brush-strokes of Chinese characters are so complicated (for the human hand and the human eye to deal with); however, the computer does not "see" language in the same way as a person does, nor does it "write" language in the manner of a human hand.

Counter-intuitive though it may be, it is much more technically complex for software to render a script with combining elements that need to be "drawn" in a slightly different way given the context of other letters surrounding a given glyph (and this is the case for Burmese, literary Sinhalese, etc.); the number of brushstrokes is irrelevant.

Inevitably, the quantitative difference in the number of millions of users for a given language entails a qualitative difference in what you can (and can't) do (on a computer-screen) in a given language. We live in a world where there is more support for some video-games than many languages; and, indeed, there is sometimes more software developed for entirely fictional languages (originating in science-fiction novels, etc.) than there is support for languages of real historical and political importance.

During the last decade, my own computing (in Burmese, Sinhalese, Pali, Cambodian, Lao, and a number of other languages) depended almost entirely on the work of just one man: the software developer Ka'ōnohi Kai at the less-than-famous firm Xenotypetech.

I corresponded with Ka'ōnohi over a period of many years: there were several periods during which I wrote to him daily, sending in detailed error-reports, and suggestions for improvement, when the fonts were under development. Of course, the Unicode standard itself was still "under development" to some extent; and the Mac OS was much less stable than everyone wanted to pretend it was.

Of course, this doesn't mean that Linux had a better solution at the time: I remember explaining technical difficulties with several languages to a Linux programmer, and he responded by dismissing what I said with a wave of his hand, insisting that "minor problems" of that kind would be (gradually and invisibly) resolved through programmers contributing fixes over the next five or ten years. The problem is that language computing can't work that way: I wasn't in a position to delay my own work by five or ten years, and nobody can publish a book with incorrect vowel-markings (or other "minor problems" that make the text incorrect and illegible) due to systemic errors that are beyond their control, and may get worse before they get better through gradual contributions of partial solutions (by unpaid volunteers).

Unlike contributions to open source projects of other kinds, language computing tends to benefit from (dare I say it?) planning, with clear deadlines, and professionally set standards and targets and (just imagine!) salaries being paid to the contributors doing the hard work. They also have to deal with error reports... and no, you can't really expect those error reports to be delivered in English. In fact, for many languages, you need to actively go out in the field and solicit error reports from fluent speakers verbally (showing them examples of print on paper, and figuring out what's not quite right with it) because too few of them will be online, or testing features themselves on computers. In every language, you also have a generation gap: teenagers may be delighted that they can type out the characters at all, whereas older people (not using computers) may be aware of problems that only arise in the literary form of a language (but not in simple vernacular) or that teenagers simply don't care about.

The projects that eventually made it possible to compute in Cambodian on Linux did not resemble the idealized open-source methodology at all: they were bankrolled by government and U.N. donors, and operated not so differently from a government bureau themselves: they had desks, they had an office, they had a payroll, and, like institutions generally do, they took a long time to get results, but they did get results. Economically, the charity sector is dominated by Christian missionaries, and many other projects for languages of the poor have been taken on by (infamous) missionary groups like S.I.L. (about whom many moral indictments have already been written, such as the 1995 book Thy Will be Done, etc.). I was surprised to find that the government of China was actively collaborating with S.I.L. to produce language materials (and software!) in Yunnan; at the time I saw those materials, they were of abysmally poor quality, but hey, where there's a paycheck, there's a way (and if their funding truly has no limits to its patience, that work may get done eventually, too... or it may not).

My point here is limited: whether the work gets done by bible-vendors or by atheistic career bureaucrats, the institutional pattern here looks nothing like the open source model (even if the results are made freely available to all, etc.) --and this is itself part of why alternatives to Mac took such a long time to catch up with (pre-Unicode) fonts on OS 9 (yeah, I said it). There were indeed people who kept on using "legacy encodings" for the better part of that decade, for that same reason (and, yes, S.I.L. was among the agencies to produce conversion utilities, from legacy to Unicode, for languages that had no commercial incentive for software developers, etc. etc.).

My correspondence with Ka'ōnohi often dealt with minute questions of how exactly two glyphs should combine, or how a certain curl-of-the-pen in a character wasn't quite the right shape. I described all of this verbally as I actually had no access to a scanner in those years. I was sometimes sending in these reports from monasteries in Sri Lanka, and, at other times, from a garret in Vientiane, and so on. Through a thousand tiny revisions, I became involved with the making of these typefaces; but all of the hard work was done by Ka'ōnohi himself, in Hawaii.

At some point in this long correspondence, Ka'ōnohi must have lost his objectivity because I convinced him, against his best interests, of the philological and pedagogical value of creating a font that could properly display the Aśokan-era inscriptions. With the full awareness that this product would have a total market of roughly ten scholars, Ka'ōnohi undertook the long hours of labor to make this possible, simply as a favor to me (or, perhaps, to reward whatever efforts he thought I had made in obsessing over revisions to so many other fonts, from modern Lao to medieval Khom).

The results remained imperfect for a rather sad reason, beyond anyone's control: the technical support from Mac was wildly inconsistent from one version of the operating system to the next. During those years, Ka'ōnohi would scramble to "fix" all of his fonts every time Apple updated their software --although there had been nothing broken about them before.

This was exasperating to people all over the world: around the year 2001, the Mac had an enormous advantage over Windows in rendering the languages of the poor (just ask anyone who was typing in Tibetan in the 1990s; the difference was even more dramatic for languages that had zero commercial incentive for software developers in that era, like Khmer and Shan); however, in the space of just a few years of corporate indifference, this advantage was eroded, along with the patience of many people who had come to rely on the Mac.

Despite the uniformity that the Unicode standard was supposed to create, many people found that they had to re-type documents on their hard drives that had become incoherent (in updating from one version of O.S. 10 to the next) through no fault of their own. As the operating system became more sophisticated, its ability to deal with "marginal" languages actually decreased, and Aśokan script specifically went from difficult to impossible to deal with.

We never met in person, and I knew that it would be impossible for us to ever do so after he renounced his American citizenship in support of Hawaii's native sovereignty. I now haven't heard from him in several years, and I haven't found any recent updates from him posted on the internet; my thanks now goes out to him without expecting to hear anything back. I was hoping to hear more from him after I switched from Southeast Asian languages to working on a language indigenous to "the new world" (scil. the Algonquian language Cree) but he seems to have disappeared.

To close with one short anecdote about the extent to which this problem goes unknown, unnoticed and unresolved: I remember talking to a technician at a Mac store when the OS made the transition from 10.2 to 10.3. I told him that I wanted to see the list of languages that were supported with the new version, and he flatly insisted that every language was already supported by the system. I then asked specifically about five different languages (each with tens of millions of speakers, not small or obscure languages, but, simply, languages spoken by the poor) and he was astounded to find that Mac could handle none of them (i.e., not even if you bought your own font, etc.). The basic attitude cultivated by players of video games is that a computer that can display Chinese, Japanese and Korean can (and does) display "everything"; however, the world is much larger than the nations that produce comic books that are read by computer programmers.

In 2012, computing in languages other than English remains at a very uncertain state: Mac's OSX is now in its creaking (error-ridden) final years, and nobody knows what will replace it. Ubuntu is becoming a robust alternative, but if you actually want to publish a book in a given language, amateur-level support for computing in that language normally isn't good enough (you can't display combinations of letters in a manner that's only approximately right). My own faint hope is that Google corporation's longstanding interest in translation (currently directed toward online applications) will spin off (directly or indirectly) some kind of OS that is fundamentally polyglot.

It's a faint hope: in many languages, the easiest way to communicate on a computer remains writing out your message in longhand, scanning the page, and then e-mailing the image as an attachment. This is still the method used by Cambodia's (retired) kind Sihanouk to make his statements to the public; and nobody would be in a position to tell him that he has really missed out on anything by not learning to type in the long sequence of barely-workable systems that have come and gone on computers during his lifetime.

e p (7)    43 Comment(s)

Technology White Papers

See More