AlphaGo’s surprising success points to just how much progress has been made in artificial intelligence over the last few years, after decades of frustration and setbacks often described as an “AI winter.” Deep learning means that machines can increasingly teach themselves how to perform complex tasks that only a couple of years ago were thought to require the unique intelligence of humans. Self-Âdriving cars are already a foreseeable possibility. In the near future, systems based on deep learning will help diagnose diseases and recommend treatments.
Yet despite these impressive advances, one fundamental capability remains elusive: language. Systems like Siri and IBM’s Watson can follow simple spoken or typed commands and answer basic questions, but they can’t hold a conversation and have no real understanding of the words they use. If AI is to be truly transformative, this must change.
Siri, Google Now, or Cortana are more like slow and cumbersome command line interfaces than actual AIs or deep learning or whatever – they’re just a shell to a very limited number of commands, a number of commands they can barely process as it is due to the terrible speech recognition.
Language is incredibly hard. I don’t think most people fully realise just how complex language can be. Back when I still had a job in a local hardware store in my area and I spent several days a week among people who spoke the local dialect, my friends from towns only mere kilometres away couldn’t understand me if I went full local on them. I didn’t actually speak the full dialect – but growing up here and working with people in a store every day had a huge effect on the pronunciation of my Dutch, to the point where friends from out of town had no idea what I was talking about, even though we were speaking the same language and I wasn’t using any special or strange words.
That breadth of pronunciation within the same language is incredibly hard to deal with for computers. Even though my town and the next town over are only about 1-2 kilometres apart, there’s a distinct pronunciation difference with some words if you listen carefully to longtime residents of either town. It’s relatively elementary to program a computer to recognise Standard Dutch with perfect AN pronunciation (which I can actually do if I try; my mother, who is from the area where Standard Dutch is from, speaks it naturally), but any minor variation in pronunciation or even voice can trip them all up – let alone accents, dialects, or local proverbs or fixed expressions.
The question is, then, one that we have discussed before in my article on Palm and Palm OS:
There are several key takeaways from Dimond’s Stylator project, the most important of which is that it touches upon a crucial aspect of the implementation of handwriting recognition: do you create a system that tries to recognise handwriting, no matter whose handwriting it is – or, alternatively, do you ask that users learn a specific handwriting that is easier for the system to recognise? This would prove to be a question critical to Palm’s success (but it’ll be a while before we get to that!).
If speech recognition is going to keep sucking as much as it does, today’s engineers either have to brute-force it – throw tons of power at the problem – or ask of their users that they speak Standard Dutch or whatever it’s called for your language when talking to their computers.
I’m not optimistic for the coming 10-20 years.