Wikipedia editors have implemented new policies and restricted a number of contributors who were paid to use AI to translate existing Wikipedia articles into other languages after they discovered these AI translations added AI “hallucinations,” or errors, to the resulting article.↫ Emanuel Maiberg at 404 Media
There seems to be this pervasive conviction among Silicon Valley techbro types, and many programmers and developers in general, that translation and localisation are nothing more than basic find/replace tasks that you can automate away. At first, we just needed to make corpora of two different languages kiss and smooch, and surely that would automate translation and localisation away if the corpora were large enough. When this didn’t turn out to work very well, they figured that if we made the words in the corpora tumble down a few pachinko machines and then made them kiss and smooch, yes, then we’d surely have automated translation and localisation.
Nothing could be further from the truth. As someone who has not only worked as a professional translator for over 15 years, but who also holds two university degrees in the subject, I keep reiterating that translation isn’t just a dumb substitution task; it’s a real craft, a real art, one you can have talent for, one you need to train for, and study for. You’d think anyone with sufficient knowledge in two languages can translate effectively between the two, but without a much deeper understanding of language in general and the languages involved in particular, as well as a deep understanding of the cultures in which the translation is going to be used, and a level of reading and text comprehension that go well beyond that of most, you’re going to deliver shit translations.
Trust me, I’ve seen them. I’ve been paid good money to correct, fix, and mangle something usable out of other people’s translations. You wouldn’t believe the shit I’ve seen.
Translation involves the kinds of intricacies, nuances, and context “AI” isn’t just bad at, but simply cannot work with in any way, shape, or form. I’ve said it before, but it won’t be long before people start getting seriously injured – or worse – because of the cost-cutting in the translation industry, and the effects that’s going to have on, I don’t know, the instruction manuals for complex tools, or the leaflet in your grandmother’s medications.
Because some dumbass bean counter kills the budget for proper, qualified, trained, and experienced translators, people are going to die.
Exhibit one: The Legend of Zelda
It translated by I assume using a Japanese/English dictionary and it shows. Some lines are okay, some cringe, and some confusing, but at least the dictionary was penned by a pro. My favorite is: “Master using it[,] take this” which can be read with Master as a noun instead of a verb. The newer translation is “Take this[,] master using it”
Pfaffa,
Yes, I suspect we’ve all seen examples of bad translations. In the case of older games it probably was done by human translators who just weren’t very proficient in English.
This one makes me laugh…
“All your base are belong to us”
With wikipedia cited in the article, there’s just no realistic way to hire enough humans to translate all the text they have into all the desired languages they want to serve. Without AI, you’d end up with only a tiny fraction of the content being available in other languages and most content being completely unavailable. Even if there were enough labor, most companies are unwilling to pay more to hire humans even if they are more proficient.
Thom, I am sympathetic with your view that the AI is not on par with human pros. As a developer I am frequently frustrated by corporate cost saving measures that decrease the quality of the product. I see this with most of my clients and it’s been of one of my big gripes in the industry. We can highlight these problems and maybe it’s important just to talk about it, but is there a solution? I genuinely don’t know what it is.
I’d argue that part of the solution is requiring that any form of machine translation that’s not human-vetted be applied by the end-user using something like Google Translate. That way, expectations are more realistically set.
(Similar to how, when I privately goof about with Stable Diffusion, I use a copy running on my own PC and that keeps me aware of how much energy it’s consuming and how much heat it’s generating… as well as letting me do it more in the winter than the summer so I can use what would otherwise be waste heat to work with my furnace instead of against my or someone else’s air conditioner.)
As for “human-vetted”, I’d say it’s certainly possible to be good enough for certain situations. I’ve read various Japanese-only things on Pixiv by OCRing them, hand-fixing the mis-OCRed characters using the character picker on Jisho, and then juggling whitespace (and using cut/paste to “temporarily hide parts of the text”) in Google Translate until I get a sense for what meanings are actually there and which ones are it tripping over idioms or onomatopoeia.
For non-life-critical stuff, we are at a point where a responsible user can trade off time for knowledge of the source language. (To a certain point. I do need a touch of intuition from having watched a bunch of subtitled anime and a bookmarked guide to Japanese onomatopoeia.)
I am much more concerned by AI used in war than stealing my job. Pervasive surveillance and automated killing is being normalized. I worry about this aspect a lot more than job market disruptions or grandma’s label on her medication.
For someone who doesn’t use AI, you sure seem to know an awful lot about what AI can and can’t do. I read the article in Serbian – the same language I’m writing this comment in. I dunno. Seemed pretty idiomatic and accurate to me.
LLMs are verisimilitude machines. As spicy autocomplete, they prioritize looking convincing, which means they’re very good at lulling you into not double-checking that their output is accurate every time you use them.
I have to disagree with the analysis here.
Even if done with AI, the spreading of knowledge and information is much more important than a grammatical error.
We have seen what happens if we leave it to people to translate the content on Wikipedia. An anglo-centric knowledge base. Let’s Use AI tooling to translate the content on-mass. The community of members can then tweak or correct issues as they go. As ssokolow said, put a banner to highlight it was non-human validated and we are good.
I’m sure we can all agree it’s Much easier to fix an error then write the document from scratch.