Are You Talking to Me? Speech on Mac OS X

Eugenia Loli 2004-03-18 macOS 27 Comments

Apple’s recent announcement of Spoken Interface has moved speech recognition to the forefront. However, Mac OS X has included speech recognition and synthesis technologies for quite some time, and in this article MacDevCenter delves into the often misunderstood world of talking to your Mac.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

27 Comments

2004-03-18 6:23 pm

Anonymous
but not to the depth and breadth that Apple is talking about integrating now.
2004-03-18 6:47 pm

Anonymous
This technology is not about speech recognition. People either 1, read headlines only and don’t read the actual technologies, or 2, get their data from someone who does 1 and doesn’t back it up. This Spoken Interface technology is about a new assitance built into the operating system for those who have diminished vision. It will read the screen to you. It is very much similar to screen readers on the PC, but it is better because it is built in, and requires VERY litle to be added to new cocoa programs.
2004-03-18 7:15 pm

Anonymous
You exhibit the failure to read the article that you attribute to others. The article IS about speech recognition. The bulk of the article is a practical demonstration of using OSX’s built-in speech recognition capabilites to interact with the computer. The article demonstrates how to activate simple (and even complex) navigation commands. Text-to-speech capabilities are covered in the article (and have been a part of MacOS for many years), but the text as a whole is not focues on benefits for the physically impaired (although the article does point out how screen zooming could be connected to a speakable item).
2004-03-18 7:20 pm

Anonymous
” The documentation provided by Apple states that the Speech Manager — the component that takes care of piping the text into the Speech Synthesizer — was first introduced in 1993. Once again, this shows how innovative Apple can be. Computers of that time were very different from what we know today, and adding speech capabilities to a consumer product — even thinking about it — was a real breakthrough. If not somewhat crazy.”

The Ammiga had speech in 1985, the Commodore 64 had SAM (Software Automatic Mouth) even earlier.

Credit where credit is due…
2004-03-18 7:22 pm

Anonymous
I’m sorry if I made it seem as if this article were about spoken interface. I was commenting on the poster leading into speech recognition with spoken interface which the poster claims are one in the same.
2004-03-18 7:23 pm

Anonymous
I meant to say not the poster’s fault but the editor’s fault. [editor of the article that is]
2004-03-18 7:30 pm

Anonymous
Can we just please stop that misunderstading right now.

It’s text to speech, as in the computer speeks to you.

Visualy impaired people have no problem useing a keyboard, they do not need speech recognition.

And no one is claming this is something new, it’s just the first time to my knowledge that it was been built right into the OS.

I’m really looking forward to what apples ease of use can do to text-to-speech.
2004-03-18 7:48 pm

Anonymous
I wonder if speech-based navigation will supplant the keyboard/mouse combination given that GUIs and OSes have been built with those metaphors in mind for quite some time. It would seem to be infinitely more efficient to navigate a modern OS with something like a keyboard shortcut than to speak “computer, close the window” or something similar. It would seem that until computers evolve into a new paradigm (away from the desktop/folder/file metaphor) speech-based navigation may be artificially grafted on to navigation. That is to say, speech isn’t really interacting with the data, it is emulating the points and clicks of a mouse and presses of a keyboard which in turn interact with data; speech-based navigation in its current form seems to be simply another layer of abstraction. I am not knocking Apple’s push, however, as perhaps the way we interact with computers can be a driving, rather than reactive, force in OS development. However, I am a little concerned about the bloat that comes with it for something I never use (27.6 MB for a voice!).

That being said, speech technologies do seem to be a huge boon to some of the physically disabled, and a good means to prevent repetitive stress disorders (although until dictation becomes more precise, keying will still be necessary). I’m not sure how crazy I would be about everyone in the cubes next to me barking “computer, open the smith account” all day, though. Maybe the new projection keyboards might be a workable option for carpal tunnel rather than yakking away into a microphone.
2004-03-18 7:57 pm

Anonymous
And before anyone has a quick reaction about text-to-speech being distinct from speech-based navigation, make certain to read all three pages of the article. Yes, text-to-speech in covered in the first page, and it is nothing really new, although Apple is working to make it more natural. The other two pages cover using OSX’s speech recognition capabilites to interface with the computer. Text-to-speech and speech recognition are distinct, but as the article implies, they can be used together as in interface that marginalizes use of the keyboard. For example, the computer reads a dialog box to you through text-to-speech, and then you tell it (thorugh speech recognition) to close the dialog box.
2004-03-18 8:22 pm

Anonymous
“The Ammiga had speech in 1985, the Commodore 64 had SAM (Software Automatic Mouth) even earlier.”

Only the Speech Manager was introduced in 1993. Remember the old video where they first introduced the Macintosh? It was telling jokes to the crowd.

Credit where credit is due…
2004-03-18 8:35 pm

Anonymous
I know they are probably adding more features

but doing something like:

1. highlight text

2. open apple -> services -> start speaking text

(or something like that)

is fun sometimes
2004-03-18 9:11 pm

Anonymous
These “innovation” discussions are always so boring. In evitably, these things were done first in the 1960’s

http://www.acoustics.hut.fi/~slemmett/dippa/chap2.html

First fully automatic text-to-speech system was made in Japan in 1968.

Integrating things into the OS does *not* count as innovation…
2004-03-18 9:15 pm

Anonymous
Has anyone watched the old TV show Connections (with James Burke) or The Day the Universe Changed? I loved those shows. They give a pretty interesting insight into a lot of the “innovation” that has gone on throughout history. Moral: innovation pretty much never occurs in a vacuum. People build on other people’s work.
2004-03-18 9:24 pm

Anonymous
Rayiner, this is about combining speech recognition and text -to-speech to create a comprehensive interface that allows you to never use the keyboard to use a program or the OS.
2004-03-18 9:34 pm

Anonymous
There seems to be some confusion here. The innovation is NOT JUST SPEECH SYNTHESIS. Speech synthesis was in the old Speak and Spell, for Pete’s sake. There is a difference between INVENTION and INNOVATION. The former implies autonomous creation; the latter implies insightful revision (although this is arguably a difference of degree, as DJ Jedi Jeff implies, since nothing really is generated in a vacuum).

Apple’s innovation is in SPEECH RECOGNITION and in combining speech synthesis and speech recognition so that your computer can talk to you and you can give it vocal commands. Like Star Trek, people (if that is a helpful metaphor). You say “check my email” and it says “You have one email from [email protected]. Should I read it to you?”

The importance of it being at the OS level is that programs could tap into this capability so they would behave uniformly across applications (such as cut and paste does now). This is unquestionable an innovation, and an interesting one at that. The real question is how useful this may actually be once it becomes mature.
2004-03-18 10:11 pm

Anonymous
Here’s something related and interesting: The PS2 had a game come out recently called Lifeline which is totally controlled with a USB headset:

http://www.wired.com/news/games/0,2101,62672,00.html?tw=wn_tophead_…

This isn’t for voice chat; the headset is the only input device used to manipulate the game. I haven’t played it, and it has gotten mixed reviews, but apparently the voice recognition is pretty sophisticated. There is also supposedly a new Harry Potter game coming out that may incorporate the PS2 camera as an input device, such that players can use hand motions to cast spells and whatnot. That’s kind of interesting, and it is only a nudge away from using a headset and a camera to use magic words and cast spells in a fantasy RPG or something.

Is it possible that if these “gimicks” are successful, they may start to find their way into computer interfaces? The Lifeline speech engine already can be downloaded and tested on the windows side. If these things work out, there may be even more pairity between consoles and computers.
2004-03-18 10:26 pm

Anonymous
Camera input games available now: http://www.apple.com/games/articles/2003/12/toysight/

(www.toysight.com)
2004-03-18 11:27 pm

Anonymous
When I first got my bondi blue imac a played around with the voice login/password…..

….

“Can You open the pod bay doors please Hal?”

….

Sometimes bondi ignored me!
2004-03-19 12:02 am

Anonymous
See, that’s the thing. Talking to the computer isn’t new. Voice control and TTS has been available for a long time, integrated into the OS no less. The Windows Speech API (SAPI) has been in Windows since 1995! And I’m sure they weren’t the first ones to do it either.
2004-03-19 12:04 am

Anonymous
Hmm, didn’t complete my thought. My point is that claiming “innovation” is really stupid. Innovative things are almost never successful. Rather, its usually completely derivative, but well executed things that are successful. I have a strong feeling, because of Apple’s traditional strength in UIs, that this speech thing will be reasonably successful, much more so than the Windows Speech API. But it won’t be innovative, not in the least.
2004-03-19 12:27 am

Anonymous
If you are in a calm environment (or if you have a headset) the osx-speech-recogntion is quite useful. You can easily left your hands on the keyboard and substitute most of the mouse navigation by the speech recognition. Sometimes even in a more efficient way! Have a look at this: You’re using your browser and want to switch immidiently to mail to create a new message. With one sentence you can do that and save time. I think that todays speech recognition cannot be used as your primary input device but you can combine it with the keyboard navigation and gain more efficiency. The recognition in mac os x is not very hard to learn – you only must have a clear pronounciation.
2004-03-19 1:21 am

Anonymous
windows xp has this already.
2004-03-19 1:40 am

Anonymous
“These “innovation” discussions are always so boring. In evitably, these things were done first in the 1960’s

http://www.acoustics.hut.fi/~slemmett/dippa/chap2.html

First fully automatic text-to-speech system was made in Japan in 1968.

Integrating things into the OS does *not* count as innovation…”

Geeze don’t you know: innovation != you_invented_it

Innnovation can be introducing technologies into current things in new ways. Apple a lot of times creates totally new technologies that are nice things to use and sometimes they are old technologies introduced in new ways.

From Dictionary.com: “Innovation: 1. The act of introducing something new. 2. Something newly introduced.”

You only need to look at Microsoft to see a company that takes other peoples technologies and simply stuffs it into windows and calls it innovation without doing much, if anything, to it.
2004-03-19 1:49 am

Anonymous
I think the point is that Apple has had the ability to take a lot of technology that has been implemented in the past and just make it work.

Selling online music was already old news when iTMS came to the scene. Yet despite’s Napster and several other companies having at least a two year’s head start no one has been able to make it work let alone legitamize buying and selling music online.

Before the iPod came around there were lots of MP3 players to choose from. Same thing, no one manages to figure out how to create something that appeals to people who really like music. Again Apple was chided with coming out with such a ridiculous product.

MacOSX is a great example of an OS that makes Unix useble to the masses.

The point is that despite a lot of other companies that have come out with speech recognition its a good possibility Apple will actually make it useable in one form or another.
2004-03-19 2:42 am

Anonymous
Geeze don’t you know: innovation != you_invented_it

No, it generally means that you either invented it, or made significant improvements to it.

Innnovation can be introducing technologies into current things in new ways.

Not by the dictionary.com definition you just gave. And in this case, Apple isn’t introducing an existing technology in a new way. Its taking something that has been done before (speech systems) and doing something that has been done before (integrating it into the OS). They’re simply (probably) going to do a very good job of it.

Apple a lot of times creates totally new technologies that are nice things to use and sometimes they are old technologies introduced in new ways.

Apple has created rather few new technologies. Certainly, Apple can’t be called an innovative company like HP, Xerox, DEC, IBM, etc, are innovative companies. Instead, Apple is extremely good at execution. Take OS X. Nothing in OS X is very innovative. Yet, its very well executed, and that’s why it is so popular. Or, take the iPod. The Rio was innovative, because it was the first pocket-side MP3 player. The iPod, though not innovative, was sublimely executed.
2004-03-19 11:38 am

Anonymous
Please, I hope English users (and Apple, for that matter) take into account that all this, no matter how old, uninnovative or whatever, are English-only features. We are in the 4rth revision to Mac OS X and there are neither non-English text-to-speech nor speech-recognition services in X. This renders some other Mac OS X functions rather useless for non-English users, such as Speakable Items to control the UI (since it cannot recognize my commands if I speak in Spanish (Spaniard), my mother tongue), or the contextual menu item in all Cocoa apps (such as Safari) “Speak selection”.

So this is, once again I am afraid, another of the “key points to upgrade” that , not only will probably not see the light in non-English environments, but that will take out the time from Apple to put efforts into bringing those environments up-to-date with previous, now rather old, features that were at some past time introduced in English OSs. Such other features are the rather pathetic handwriting recognition in Ink if you are not writing in English or Sherlock’s limited support outside the US (and I am not talking about services needing a non-Apple company backing them; for instance, why the heck is the eBay channel limited to US searches?).

And what makes all this even more sad, is that some of those brighly “new” features that some of us non-English users are missing, were present in pre-X times, such as Spanish (Mexican, not Spaniard, but nothing is even less) voices for the text-to-speech engine, which Apple has not even taken the trouble to directly port without optimizations.
2004-03-19 2:17 pm

Anonymous
No, it generally means that you either invented it, or made significant improvements to it.

An “improvement” can also be:

1) implementing it in a way which is finally useful and easy to use

2) integrating the invention in a useful way with something else that no one had thought of

Not saying either of those apply in this case. But if a company is the first to do something with an invention that has been around for years – or the first to implement it well there has to be some credit given. Whether you call that credit “innovation” or something else is up to you.