“You’ve heard of killer apps? How about an app killer? This is what voice recognition has become over the years, because for the most part, it doesn’t work. I see no evidence that it ever will, at least not in the sense that we can achieve true voice dictation capability. What annoys everyone most about voice recognition is that it almost works. This is the problem. When something almost works, developers continue with the same thinking that got them to “almost,” rather than starting over with new ideas. We are now stuck in a blind alley.” Who else? Dvorak is hitting the nail in the head again.
I would rather type late at night than dictate to my computer, regardless of the effectiveness of voice recognition. I tend to think and construct sentences better without having to listen to my own voice, not to mention listen to myself editing a document with my own voice. It’s also far less effort. I would also rather control elements of the computer with my eyes or my fingers (directly) rather than a mouse. Things like scrolling, pointers and menus should be made far more intuitive. Voice recognition is far from ideal for most applications, even if it were perfect and the editing effortless. I find it annoying. After all, I don’t say things aloud when I hand-write a letter. Why should I have to with the computer?
I don’t talk to objects in my 3d environment to get them to do things. I move them with my hands, and view them with my eyes, and see other surfaces by changing my view. Computers should be more intuitive, and voice recognition can play a small part, but I think it’s far from the best answer for interacting with everything. I think the next step is some sort of 3D interface.
Lastly, I think Dvorak is a media whore looking for page hits who states the obvious and exaggerates it for sensationalism, true to every mainstream writer out there appealing to the lowest common denominator.
As a cognitive scientist student attending a speech technology class I can say that developers of the current commercial software doesn’t take a lot of input information in the signal to account. Their are research going on in prosody (I am not sure that’s the correct english word, the “melody” of a sentence) which has the capability to greatly enhance the disambigution task of the recognizer.
Further on their are alternatives developed for the traditional CFG (context free grammar) parsers, probabilistik theories are getting better in the language engineering area, etc.
As for apps, there are many situations when the operator cannot use a traditional keyboard like when driving a car, pilots, etc.
How is training voice recognition sofware to your voice cheating?
Do you think Christopher Reeve would think it was ‘cheating’? I’m sure he would just be happy that he can use the computer on his own. In my experience, some of this cheatware is also very good, and can be a big time saver in east asian languages where typing can become extremely complex.
I agree with the author that voice recognition on a grand scale is not yet ready. In fact, we have a voice recognition operator at our work, and it is a pretty big joke. If you call the main phone line, the computer picks up and asks who you would like to talk to. You say the first and last name, and in theory, it will connect you to their extention. Of course, there are so many difficult to pronounce names around here (I work in the computer business), that 70% of the time, the recognition fails entirely, and you have to be redirected to a human operator. The funny thing is that the software has it’s own way of pronouncing a name, the pronunciation is not based on how the person whose name it is would pronounce it. This is really stupid because it tries to pronounce every name as if it were an English name.
Well that’s just an example. Untrained voice recognition has a long way to go, but I don’t think it is unattainable. Trained voice recognition is just starting to come of age, and in my opinion has very practical real world applications. This is not an attribute of what I would consider to be cheatware.
What a lame article. Dvorak showing his usual style of slamming something to get media attention.
Just because his Dragon Dictate or whatever his pet peeve of the moment is is not doing 400wpm without skipping a beat, and has to be trained for his voice, does not mean that language engineering has advanced over the years. He doesn’t even make mention of recognition advances such as picking speech out of a high dB noise floor, which is essential for pilots, in-car-use, etc.
Maybe he should take his misguided opinions up with the paraplegics, blind, and disabled folk who depend on this technology to even use a computer at all.
If I was a VR developer right about now I’d be thinking about sampling his voice from TechTV, putting said voice fingerprint into my software ans ensuring it did not work for him. :/
I wrote a vowel recognition program over 10 years ago. If you’d want to speak, you’d speak to human being, not to a silicon analyzer. Go go neural networks..
Read these:
Opinion Boy: “I would rather type late at night than dictate to my computer, regardless of the effectiveness of voice recognition”
V Turjanmaa: “If you’d want to speak, you’d speak to human being, not to a silicon analyzer.”
…and now imagine these quotes
“If I want to write to someone, I’ll use the postal service – who needs e-mail?”
“I get my news from the newspaper, thank you, not this internet thing.”
“We use the radio, which is far more used than the television.”
They’re the equivalent. Don’t analyze new technologies by today’s standards, think of how we might use them in the future. 10 years ago cell phones and PDAs were only for the extremely rich and were quite useless. Today, my RIM blackberry and digital mobile phone are essential to my job.
Never forget, the director of the patent office in the early 1900’s REALLY DID SAY: “Everything that is worth inventing has already been invented.” And that was before TV, vacuum cleaners, computers, the internet, jet planes, dishwashers, electric washing machines, and electric air conditioning, to name a few.
I still don’t think I’d want to use voice commands for my computer. I could just imagine a voice only interface:
Me- “Computer, open Outlook 2010.”
Computer-“Command accepted.”
Me- “Computer, create email message.”
Computer-“Error, ‘create email’ unrecognized command.”
Me- “Computer, create NEW email message.”
Computer-“Command accepted.”
Me- “Computer, begin dictation.”
Computer-“Command accepted.”
Me- “Hello mother, I will be visiting shortly right after . . (telephone–RING RING RING) Hello? no I didn’t order that. A WHAT! No Go@@**it I won’t accept the charges. Well &%%$ you too. . . ”
Computer-“Dictation recorded. Message sent.”
Present voice / sound recognition teqchniques work on the basis of statistical analysis and comparisons with known sounds and words, etc. The present problems are due to the large variance between different voices – any statistical comparison must be able to ignore the differences between voices and still identify the word correctly, let alone putting it in the right context.
We need to get the math up to scratch for VR to work properly.
>>”developers continue with the same thinking that got them to “almost,” rather than starting over with new ideas.”<<
So . . . thats what happened with Windows.
all work done into this field right now go nowere. first they need to have a good engine to “understand” english (or other language). Then they could work on the sound->text translator.
This pretty much mean, get AI right then go to the sound sampling. Current voice recognition event if 99% is useless because curent OS are better to work with with keyboard. Voice recognition will need complete rewrite of OS, not just the UI but many part of the structure.
these comments will be funny in 10 years…
OpinionBoy: “I would rather type late at night than dictate to my computer, regardless of the effectiveness of voice recognition”
—-
Well, unless the computer can convince me to actually change my habits and love listening to my voice when I construct sentences, I’ll hold you to that one. Voice recognition has its uses – definitely – and some of them are truly groundbreaking, but I will never be writing with my voice, *regardless* of the effectiveness of voice recognition. It’s unnatural to my thought process, and I ain’t gonna be talking to a computer in the late-afternoon or night, when I tend to be at my most creative. Also, I reckon putting a bunch of people in a room with voice recognition software would be a bit of a nightmare in the noise department 🙂
Aside from the misleading title I have to agree with Dvorak. I bought one of those programs to try out. For those who haven’t tried it- the recognition does get better as you use it – slowly. The learning process is really tedious. The program is always guessing what you’re saying, often with hilarious results.
“first they need to have a good engine to “understand” english (or other language)”
I don’t think command of the language is what’s needed – more an understanding of language usage. A lot of this is regional – compare the “American English” phrases (let alone the accent) of a New Englander with a Texan. So much of English (and other languages IIRC) have words that sound the same or similar. Without putting the words in context they cannot be identified accurately.
Preprogramming these products to understand a myriad of usages would be nearly impossible (and prohibitively expensive) so the tedious learning process is probably here to stay.
Guys and Gals.
It is clear that most if not all of you have not used Voice Recognition systems.
I have for the past 11 years, on and off, in part because some had convenient
interfaces and then hardware/OS of choice didn’t have software for such on it.
Training takes little to NO real time now, and it really isn’t a hassle. And
honestly, personalities or retraining are not a big deal either. I can say
to my computer what I want when I want, unfortunately in some software and not others.
That is the big deal, not the training, or that I don’t get 400 wpm in a 2-3 Ghz
system (now dual 1.4 and beyond :o) with such software. The problem is not how
easy it works, or that it has to be trained, or how accurate it is (I make all kinds
of TYPOS when I type, and also when I talk I rephrase things all the time, it is human
to error, so don’t expect your computer to get it right for you, Dvorak, got it???).
The thing works, and if it does it 90%+ of the time, that is better than cars
that don’t have autopilot. You take the hands off the wheel and you are dead in
seconds, or minutes at high speed. So using technology takes guidance, always has
and likely in my lifetime always will. I just wish there was such a product for my
OS of choice now, the BeOS (dead, but fully enjoying it, shows you how much technology
needs to really advance now doesn’t it :o)
Dr. UBA
Dvorak is missing the point — consumer’s versions of voice recognition is not the real market. The real markets are (1) telematics and (2) call centres — and both markets are using very limited forms of voice recognition.
They are not only killer apps, they are call centre job killers. And they will be getting even cheaper when VoiceXML or MSFT’s SALT gets wide deployment.
Hey — just to let everyone know — mac os x’s built in speech recognition works, and works WELL.
i have no problems with it, except that everyone makes fun of me when i use it!!
“Quit this application”
“Get my mail”
“Switch to Terminal”
it just feels kind of lame to use it, that’s all. like i’m some BIG computer geek.
The problem with voice recognition and a lot of other holy grails (such as agents you can “tell” to perform tasks, i.e. “find some cheap tickets to the next hockey game; try ebay and ticketmaster”), is the lack of a “common-knowledge base”; the sort of thing that lets software figure out what you mean from what you say (and ask for clarification if it can’t tell right away).
People have been working on this for a long time. The Cyc (“encyclopedia”) team recently released their first version; it’s primitive, but brings hope that some of the above will eventually come to fruition.
I agree With Dvorak that VR hasn’t lived up to its promises, but nothing could live up to the hype that surrounded it several years ago; duh. Anyone who listens to marketing deserves what they get. It’s coming though, give it time.
http://www.cyc.com/