Why has it taken until the last few years for speech recognition to be adopted in day-to-day use? The technology has many hidden industrial applications, but as a real-time user interface for day-to-day use, i.e. talking to your computer, adoption has been unbelievably slow. When I was studying in the 90s, I read about a sort of reverse Turing test, which demonstrated one reason why. Volunteers believed they were talking to a computer, but responses were actually provided by a human being typing “behind the curtain”. The observations and subsequent interviews showed that, back then, people simply didn’t like it.
So, what’s the problem?
We have a Google Home in the house, and we basically only use it to set kitchen timers and find out the outside temperature (so we know how many layers to put on – we live on the arctic circle, and -25-30°C is normal). That’s it. I don’t see much of a use for anything else, as our computers and smartphones are both easier to use and faster than any voice assistant or voice input.
The key to modern voice assistants is that they are basically glorified command line interfaces – they need a command and parameters. What makes them so hard to use is that these commands and parameters are pretty much entirely undiscoverable and ever-changing, unlike actual command line interfaces where they are easily discoverable and static. If voice input and voice assistants really want to take off, we’ll need to make some serious advances in not just recording our voices and mapping them to commands and parameters, but in actually understanding what we as humans are saying.
We’re a long way off from that.