Linked by Thom Holwerda on Fri 7th Oct 2011 20:48 UTC
PDAs, Cellphones, Wireless I don't think I've ever seen this before, but please correct me if I'm wrong. Samsung anf Google were supposed to unveil the Samsung Nexus Prime with Android Ice Cream Sandwich next week, but in a surprise announcement, the companies said that the press event is cancelled - out of respect for Steve Jobs. In the meantime, leaked specifications reveal that the Nexus Prime could be a real doozy.
Thread beginning with comment 492426
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[7]: press release interpreted
by Neolander on Mon 10th Oct 2011 16:33 UTC in reply to "RE[6]: press release interpreted"
Neolander
Member since:
2010-03-08

"Siri is not à voice recognition system it's an AI system"

(Disclaimer : Although I believe I have the required knowledge of physics, signal theory, and programming, I have never worked directly on a voice recognition system. So anyone who has, please correct me if you detect some bullshit in the upcoming post)

So you believe that it is possible to make a decent voice recognition system without AI ? I don't think so, and am going to explain why.

What is voice recognition ? Basically speech to text translation. Basic theory is that you take an audio file or stream of someone saying something, you isolate words and detect punctuation based on the pauses and intonations of the talk, then you take each word separately and try to slice it into phonemes, which are pretty close to syllables but not quite the same thing. From phonemes, you can get the textual word. (to be continued, stupid 1000 char phone browser limit)

Edited 2011-10-10 16:47 UTC

Reply Parent Score: 1

Tony Swash Member since:
2009-08-22

"Siri is not à voice recognition system it's an AI system"

(Disclaimer : Although I believe I have the required knowledge of physics, signal theory, and programming, I have never worked directly on a voice recognition system. So anyone who has, please correct me if you detect some bullshit in the upcoming post)

So you believe that it is possible to make a decent voice recognition system without AI ? I don't think so.

What is voice recognition ? Basically speech to text translation. Basic theory is that you take an audio file or stream of someone saying something, you isolate words and detect punctuation based on the pauses and intonations of the talk, then you take each word separately and try to slice it into phonems, which are pretty close to syllabs but not quite the same thing. From phonems, you can get text. (to be continued, stupid 1000 char phone browser limit)


Honestly - the lengths some people will go, people who claim to be genuinely interested in technology, to argue absurdities just so that can belittle something Apple is doing. Do you really believe any that tosh you just wrote?

Clearly speech recognition software recognises words. It may have attached to it a programme that can recognise set phrases and connect those set phrases to an object. That is impressive but you know as well as I that such software is very limited and that it is a very stupid system.

What Siri does is listen to what you are saying and then infer from the context of the conversation what phrases might mean. It seems to do this an order of magnitude better than anything else out there let alone anything on a phone. So if you are having a conversation with Siri about two appointments clashing you seem to be able to say something like 'move it to the next day' and (like a human could) Siri will know what 'it' is and what the next day is and what moving 'it' means all from the context of the conversation you having with it. If it works as claimed, and those commentators with a hands on experience say it does indeed seem to work as claimed, then Siri is very, very impressive and might well represent a true step forward in the way humans interact with technology.

So as I said if people who claim to be interested in technology want to argue that it is trivial just because it is attached to Apple well more fool them. The only way to lose a limiting phobia is to stop being afraid of the phobic object.

Reply Parent Score: 2

Thom_Holwerda Member since:
2005-06-29

What Siri does is listen to what you are saying and then infer from the context of the conversation what phrases might mean. It seems to do this an order of magnitude better than anything else out there let alone anything on a phone. So if you are having a conversation with Siri about two appointments clashing you seem to be able to say something like 'move it to the next day' and (like a human could) Siri will know what 'it' is and what the next day is and what moving 'it' means all from the context of the conversation you having with it.p


Nothing in what you describe even *remotely* resembles an AI - it's just a speech recognition system. Deriving things from the context of data on your phone ("hey this new appointment you're making is clashing with this one") is not AI.

This is my worry about the system. So far, it just looks like a speech recognition system with more commands and the ability to parse some contextual data - which has so many possible error vectors it's crazy. Language parsing is VERY difficult even without contextual parsing - let alone with.

You're making it seem as if you can just say whatever you want, with Siri figuring it all out. This is highly misleading, as just as with any other speech recognition system, you'll have to learn and find out which commands it supports, and which it doesn't. Programming in some default en-US sentences is all fine and dandy, but what about all the various dialects? Heck, even my friends from Amsterdam (60km from here) have issues understanding me when I go full-on local dialect on their ass.

So far, it seems Siri will suffer from all the usual pitfalls every other speech recognition system suffers from, and not even ten bucketloads of contextual data can change that.

Edited 2011-10-10 16:59 UTC

Reply Parent Score: 1

Neolander Member since:
2010-03-08

"(Disclaimer : Although I believe I have the required knowledge of physics, signal theory, and programming, I have never worked directly on a voice recognition system. So anyone who has, please correct me if you detect some bullshit in the upcoming post)

So you believe that it is possible to make a decent voice recognition system without AI ? I don't think so.

What is voice recognition ? Basically speech to text translation. Basic theory is that you take an audio file or stream of someone saying something, you isolate words and detect punctuation based on the pauses and intonations of the talk, then you take each word separately and try to slice it into phonems, which are pretty close to syllabs but not quite the same thing. From phonems, you can get text. (to be continued, stupid 1000 char phone browser limit)"

Honestly - the lengths some people will go, people who claim to be genuinely interested in technology, to argue absurdities just so that can belittle something Apple is doing. Do you really believe any that tosh you just wrote?

That voice recognition needs some learning AI algorithms to work well ? I hope I gave enough examples of AI use cases in voice recognition throughout the continuation of the post you're quoting in order to prove this.

Clearly speech recognition software recognises words. It may have attached to it a programme that can recognise set phrases and connect those set phrases to an object. That is impressive but you know as well as I that such software is very limited and that it is a very stupid system.

You're missing a great deal of complexity in the "recognises words" part, as I mentioned above...

Anyway, I want to be sure that you understand that making a computer understand textual commands is a separate problem from voice recognition, or "listening" as you call it. Good speech recognition does not have to understand what you're saying, only to find out how it is written. Conversely, understanding written sentences does not require you to recognize spoken language, as an example modern search engines do some amount of natural language processing without asking you to talk to them first.

Recognizing simple, well-defined sentences is only one example of thing that can be done with text translated from spoken language. It happens to be simple and reliable, which is why many devices do it. But natural language processing algorithms can go beyond that. As an example, they can work with synonyms and different noun declinations, correct your spelling and grammar, or locate the keywords in a sentence and ignore the rest.

What Siri does is listen to what you are saying and then infer from the context of the conversation what phrases might mean. It seems to do this an order of magnitude better than anything else out there let alone anything on a phone.

Listening to what you are saying IS voice recognition, so what would be new there is contextual commands. But is it really new ? All of the phones I have ever owned have this "back" button. Its behavior changes depending on what I'm currently doing. As an example, if I press "back" while I'm writing a text message, my phone will ask me to confirm, because it is likely that I did it by mistake. This is a typical example of a contextual command which everyone knows and loves, available on every phone sold today.

So if you are having a conversation with Siri about two appointments clashing you seem to be able to say something like 'move it to the next day' and (like a human could) Siri will know what 'it' is and what the next day is and what moving 'it' means all from the context of the conversation you having with it.

I fail to see what's so outstanding. When I press the red button of my phone to close a running program, the message I send to the OS is no more detailed than "close that". The OS has to find out which software is currently shown on screen before it may close it.

OSs already have to deal with context-based orders in their current incarnation, you just don't see it because they are sufficiently well designed not to be mistaken.

If it works as claimed, and those commentators with a hands on experience say it does indeed seem to work as claimed, then Siri is very, very impressive and might well represent a true step forward in the way humans interact with technology.

It agree that it may may be a step forward in the integration of various existing concepts together (voice recognition, textual order processing, context-sensitive commands, web searches). I don't agree that it is a big advance in any of these domains. The technology was already there. What Apple have done is to put it together in a package, that may or may not be more pleasant to use than other solutions. Real world testing will tell.

So as I said if people who claim to be interested in technology want to argue that it is trivial just because it is attached to Apple well more fool them. The only way to lose a limiting phobia is to stop being afraid of the phobic object.

The main thing which I'm afraid of when it comes to Apple products is the reaction of other members of my beloved specie ;)

Take touchscreens, as an example. Many people love them, probably because it's nice-looking, feels simple, and allows for larger screen estate per cubic centimeter of phone. Thus, the market for keyboard-based phones is declining. Now, when you look beyond the shiny coating, touchscreens have many drawbacks, and for many common phone use cases their usability is perfectly awful. I understand that they have advantages, but good physical keys are the thing which works best for me. Thus, I wish I would still have the choice to buy a good touchscreen-less phone. I don't. I hope voice command won't set a new fashion in a similar way, and reduce my phone usage efficiency like touchscreens have.

Edited 2011-10-10 19:28 UTC

Reply Parent Score: 2