Talking to your computer has been a staple of science fiction since at least the 1960s, but it looks as if it’s finally coming within reach. This week saw the release of the first speech recognition software capable of handling continuous speech without the user having to train it in advance, namely Nuance’s Dragon Naturally Speaking (DNS) version 9. For anyone else who tried IBM ViaVoice or Dragon Dictate a few years ago, found it awkward to get the system used to your voice, and even more awkward to speak in a staccato word-by-word fashion, this is a huge leap forward.
I don’t think it is the “first speech recognition software capable of handling continuous speech without the user having to train it in advance.” Sphinx (http://cmusphinx.sourceforge.net/html/cmusphinx.php) doesn’t really use training (although you can) and has been around for a long time (decade?). Perhaps they meant the first commercial or consumer software package, but it is definitely not the first software.
As my friend who uses Sphinx on a robot noted, “you can get near 100% accuracy with Sphinx4 if you use a JSGF finite state grammar and are in quiet conditions and/or have a good mic close to your mouth.” That said, Sphinx is usually limited to a small vocabulary (< 100 words), so this new software might be better at larger sets of words.
Sphinx relies on limited domain of discourse to reduce the difficulty of the task. It’s not just that it’s a small vocabulary, but that it has task specific grammar to use.
If you want to have fun with Sphinx or similar systems, call a voicemail system that uses it and ask it about something relevant but unrelated — like ask the bus kiosk how the weather is.
viavoice and dragondictate both do continuous speech recognition now, but require training to get efficiency up to usable.
i would have to try dragon 9 before i would believe that it has a high recognition rate without training. that’s a very difficult problem.
however, i’ve had reasonably good luck with both dragon and viavoice, after careful training.
neither works well for programming, though.
Tried DNS 8 a few months ago, and as soon as I would speak, CPU usage would go up to 99% and my computer would grind to a halt, this on a P4 2.8ghz w/512MB of RAM. It was hard to tell how accurate it was, as having to speak one sentence at a time and then having to wait for 30 seconds or so for the results was more trouble than it was worth.
I’m desperate looking for a way to get text from a printed book onto a computer. I tried one of those OCR pen scanners (C-pen 800), and I guess I can’t scan in a straight line or something, because that thing didn’t work for sh*t.
I’d gladly pay $1,000 or more for a workable solution.
Edited 2006-07-22 02:28
Buy a copy of the book you can affort to destroy. remove it from its binding. Get a decent OCR program and a flat bed scanner. Train the scanner for the fonts used in the book.
If you have to do this with multiple books, find a library that has a copy machine designed for copying from bound material, and use it to make copies of the pages you need to scan.
The trick is a decent flatbed scanner and decent OCR software, and you should be able to get both together for less than a grand.
Most places won’t let you make copies of copyrighted material.
Libraries will let you make copies of limited amounts of copyrighted material, since that’s allowed under fair use.
Dragon, the world’s first continuous-speech dictation software was sidelined
after it’s creators (Jim and Janet Baker) sold it to L & H for stock.
L & H went “belly up” after it was found they made up $277 million in revenue.
from the wired article
http://www.wired.com/wired/archive/11.02/code_pr.html
“Left with nothing, Jim and Janet Baker turned to the courts. In a failed
attempt to retrieve Dragon from among the L&H assets that were now locked up
by bankruptcy laws, they hired the powerhouse law firm run by David Boies. …
The shelves of their home are crowded with figurines, all colors and sizes –
made of glass, wood, plastic, brass – and all shaped like dragons, emblems of
the company they no longer own. Sitting at her dining room table, Janet Baker
is stoic. Her still hands rest on a place mat. It’s as if she’s at a vast
distance from Dragon Systems.
But she’s not. The Dragon application, with the 300,000-line recognizer at
its heart, lives just a couple of dozen exits north on Route 128. The code’s
new owner, ScanSoft, bought it at auction in the luxurious law offices of a
bankruptcy firm…
Janet Baker has reservations about how her software will fare. ‘ScanSoft will
make incremental improvements,’ she says politely, ‘but they won’t apply the
resources we did. The progress in the field has slowed immensely with Dragon
out of the picture.’
As for the OCR problem, try using a flatbed scanner. You won’t have the
problem of a shaky hand with no hand involved.
I suffer from RSI occasionally, so I use Dragon to type things that don’t require immediate responses (e.g. long e-mails on the backburner or notes to myself). If you use Dragon with their own program (DragonPad) and then cut-and-paste the results into whatever editor you’d really like to use (MS Word, or an edit box is Firefox), everything is great. The point of Dragon isn’t to replace your keyboard entirely, but just to make it so we don’t use it quite as much, since keyboards cause injuries and are awkward to use, despite most techies managing to have mastered them.
I threw in for the upgrade to Dragon 9, I hope it gives what’s promised — namely improved accuracy.
Using voice command, I always feel weird sitting in a room by myself talking to my computer. Dictation is another story, because you are just converting speech to text, but I find the whole concept of voice command unsettling.
Using voice command, I always feel weird sitting in a room by myself talking to my computer. Dictation is another story, because you are just converting speech to text, but I find the whole concept of voice command unsettling.
Why? Are you scared your PC will talk back to you?
Why? Are you scared your PC will talk back to you?
You think clippy is bad? try listening to peedy read some time.
David Pogue gave Dragon Naturally Speaking a very favorable review in the NY Times tech section here:
http://www.nytimes.com/2006/07/20/technology/20pogue.html?ei=5087~*…
I used DNS Preferred extensively several years ago. Pogue says that the initial training session is now optional and recognition is quite good at 99.6%. It improves with training.