This is both the scariest and the most amazing technology Google demoed on stage during I/O today.
Today we announce Google Duplex, a new technology for conducting natural conversations to carry out “real world†tasks over the phone. The technology is directed towards completing specific tasks, such as scheduling certain types of appointments. For such tasks, the system makes the conversational experience as natural as possible, allowing people to speak normally, like they would to another person, without having to adapt to a machine.
You must listen to the recorded conversations where a computer is making appointments with a hair salon and restaurant. The computer-generated half of the conversation sounds incredibly natural, with interruptions, “uhs”, and so on. It even managed to fully understand the heavy accent of the restaurant worker, which even I had a hard time understanding at times. I am absolutely stunned this is even possible.
This is downright amazing, and will be built into the Google Assistant – so it can make appointments for you. While I doubt I’d ever even want to use something like this, there’s no denying the technology is incredibly advanced. I am wondering, though, about the possible negative consequences of this technology, especially combined with advanced video editing tools.
this will make anti-social people (or ones with social anxiety) even more withdrawn. Why bother with a cumbersome task of doing a phonecall, when you can have a computer do it for you?
plus, if you dread phonecalls – you can opt to send an email or text.
who exactly is that aimed at?
now that i think of it, telemarketing might pick up that technology. i just hope the countermeasures to that pestilence also will.
and, inevitably – 4chan will definitely make extensive use of it, once they get an idea of how to use it for trolling people.
Edited 2018-05-08 20:21 UTC
“I am an automated system that understands complete sentences.”
Yeah, my ass. The problem with these so-called natural systems, in addition to the fact that I have to constantly repeat myself, is that if you’re relying on one of these you might as well dump the phone system and do things with a computer or mobile device directly. If your phone system is so complicated that a simple one or two-level menu won’t get the job done, redesign it. These voice systems, no matter how good, will always be limited by the PSTN network’s audio quality–or lack there of–and this really hurts voice recognition especially where precision is required.
The uh’s are interesting. I’ve heard that um’s and uh’s were added to Grand Theft Auto: Liberty City’s police dispatcher to mask loading times and calculation time. I wonder if this a similar addition, or if it’s just superficial?
There’s no such game as GTA: Liberty City
Do you mean GTA 3? GTA: Liberty City Stories? GTA IV? Or even Grand Theft Auto? I think Chinatown Wars and GTA: Advance were also set in liberty city…
On my phone, around a year ago, I had to resort to an automated spam-filter for inbound calls and texts. In reading this, I have a strong feeling that this is about to get *much* worse once phone-spammers get ahold of the source code for Google Duplex –and they will, whether it’s been released or not, eventually.
Just think how that would work… no longer does the scammer with an accent need to worry about being tripped up by his own voice when he can simply load a voice-font for the nation he’s targeting so that all automated calls that he can now script-and-forget are being handled on autofire by an AI… that’s just great.
What’s more, that’s likely just scratching the surface. How long before we can just sample someone’s voice in a conversation, then feed it to an AI like this to pull off a very accurate impersonation of whoever we please? At that point, I could see someone using this to scam people using sampled voices of the target’s own family against them. I can think of nearly endless examples of how Duplex could be used in the commission of a crime… I guess only time will tell if that comes about or not, but I’d wager that it’s more likely than not. Afterall, if I can think it up, someone else with much more nefarious intentions could certainly do the same.
Eventually, somebody will create a service where an ‘attendant’ AI will answer the phone for you, and weed out the ‘spammer’ AI, and these two will play a cat & mouse game, trying to outsmart each other.
They could call it “Lenny as a Service”:
https://www.youtube.com/watch?v=3CsEuJNSnh8
In IT, any sufficiently advanced technology is indistinguishable from a rigged demo.
I’d put that the other way around… any sufficiently rigged demo is indistinguishable from advanced technology…
In that call between the restaurant and Duplex there was a total breakdown in communication between both sides….
I think that somebody over at google …
is/was a Max Headroom fan …
I also think that this will eventually be used in ways we cannot even imagine right now
The more complex is the AI, the more interesting would be to make two (or more) of those things to talk to each other…
I guess they didn’t consult Google DeepMind’s ethics board?
Or maybe… they did.
It does add to the list of interesting questions regarding what can be accepted as proof in the future. Images and video is pretty much out as it can be faked in real-time. Anybody’s voice can be emulated perfectly very soon (allready?). So how do I know I’m really talking to my bank-assistant on the phone and not the faked identical voice of a scammer? Or, how do we know if that “compromising leaked video-tape of Mr. thus-and-so” is real? What counts as proof in a court? And so on.. It will be solved, but there are challenges (and opportunities) ahead.