This promise of a world responsibly empowered by AI continues to drive our work at Google DeepMind. For a long time, we’ve wanted to build a new generation of AI models, inspired by the way people understand and interact with the world. AI that feels less like a smart piece of software and more like something useful and intuitive — an expert helper or assistant.
Today, we’re a step closer to this vision as we introduce Gemini, the most capable and general model we’ve ever built.
Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.↫ Demis Hassabis on Google’s official blog
It’s no secret I’m not particularly impressed by “AI”, not least because its ability to autocomplete complete nonsense based on copyrighted works it’s drawing from without permission and the dangers this might represent to our society. That being said, Google’s new “AI” thing, as demonstrated in this video, actually seems a tiny bit impressive. It still looks like to me like it’s just blurting out random information using fairly mundane things like object and speech recognition, but the fluidity of it all definitely feels a lot more natural than whatever OpenAI and Microsoft have shown so far.
I’m still not even remotely interested in any of this stuff, but this at least seems slightly more possibly useful than other examples I’ve seen so far.