Speech-to-speech translation is one of the most difficult language-related activities you can engage in. I study English in Amsterdam, and a part of that study is of course spent on translating Dutch material to English; even though we focus on text-to-text translations, we've been given glimpses of speech-to-speech translation as well, and this has made me aware of the incredible knowledge and mental agility required to do this. You need to know both the source as well as the target language inside-out, and especially when the translating is done 'live', you need to be able to keep up with the speakers. Despite all this, translating can be a very soothing activity; I find almost nothing as comfortable as sitting behind my computer, with several paper dictionaries scattered around my keyboard, stumbling from fixed expression to fixed expression, from saying to saying.
According to ICT Results, the TC-STAR project, "the first project in the world addressing unrestricted speech-to-speech translation", needs to perfect three key technologies in order to operate properly: Automatic Speech Recognition (ASR) transcribes the spoken words to text, while Spoken Language Translation (SLT) translates the source language to the target language. Text to Speech (TTS) finalises the process by turning written words into speech.
While none of these technologies are new, none of them are anywhere near perfect. In order to optimise the output of each of the three technologies, the TC-STAR project combined several ASR and SLT systems, which made the output considerably more accurate. The system translated speech between Spanish and English, as well as radio broadcasts from Chinese to English (which is probably all the more impressive).
The system obviously still cannot match human translators, but the TC-STAR project states that within a few years, a commercially viable automatic speech-to-speech translator might become available. Until then, the components of the project have been released under an open-source license.



2