"It was also believed that a computer could never fit in a trouser pocket"

Google unveiled the first phase of its next-generation artificial intelligence (AI) model called Gemini in early December. CEO Sundar Pichai, who has driven development for years and was previously responsible for Chrome and Android, is notoriously product-obsessed. In 2016, he predicted in his first founder’s letter as CEO that “we are moving from a world where mobile is at the forefront to a world where AI is at the forefront.” In the years that followed, Pichai deeply integrated AI into all Google products from Android devices to the cloud.

Still, last year was largely dominated by the AI releases of another company: OpenAI. The launch of DALL-E and GPT-3.5, followed by GPT-4 this year, dominated the sector and sparked an arms race between startups and tech giants. Gemini is the youngest litter in this race.

The cutting-edge system was developed by Google DeepMind, the new organization led by Demis Hassabis that brings all of the company’s AI teams under one roof. Gemini is already integrated into Google’s chat tool Bard and will be rolled out across the company’s entire product range by next year.

MIT Technology Review spoke with Sundar Pichai on the eve of Gemini’s launch about what Gemini will mean for Google, its products, AI and society in general.

Why is Gemini so exciting? How do you see the big picture in terms of its capabilities, its benefits and the direction in which AI is evolving in all your products?

What makes it so exciting is the fact that it is a fundamentally multimodal model. Just like humans, it learns not just based on text alone, but through text, audio and code. This makes the model inherently more powerful and I think it will help us develop new capabilities and contribute to progress in this area. That is exciting.

It’s also exciting because Gemini Ultra is state-of-the-art on 30 of the 32 leading benchmarks, and especially on the multimodal benchmarks. The MMMU (Massive Multi-discipline Multimodal Understanding) benchmark shows progress in this area. Personally, I find it exciting that at MMLU [Massive Multi-Task Language Understanding]one of the leading benchmarks, exceeded the 90 percent threshold, which is a major milestone.

Two years ago the state of the art was still 30 or 40 percent. You just have to consider how great progress has been made in this area. About 89 percent of the time, a person is the expert in these 57 areas. It is the first model to cross this threshold.

I’m also happy because it’s finally being included in our products. It will be available for developers. It’s a platform. AI is a profound platform shift, bigger than the web or mobile. That’s why it’s a big step for us at this moment.

Let’s start with these benchmarks. Gemini seemed to be ahead of GPT-4 in almost all, or almost all, ways, but not by much. GPT-4, on the other hand, seemed to be a very big leap forward. Are we about to reach a plateau that some of these big language model technologies can reach? Or will we continue to have these big growth curves?

First of all, we still see a lot of scope. Some of the benchmarks are already high. You have to realize that if you try to go from 85 percent to anything higher, you’re now on the edge of the curve. So it may not seem like much, but progress is being made. We will also need newer benchmarks. This is one of the reasons why we also looked at the multimodal MMLU benchmark. For some of these new benchmarks, the state of the art is much lower. Its a lot to do. The scaling laws will still work. As we make the models larger, there will also be more progress. When I look at it all together, I really feel like we’re still at the very beginning.

What are Gemini’s key breakthroughs and how will they be used?

It’s so difficult for people to imagine the jumps that will happen. We provide APIs, and people are going to think about it in a pretty deep way. I think multimodality is going to be pretty big. As we teach these models to think, there will be bigger and bigger breakthroughs – and the really deep breakthroughs are still to come.

One way to think about this question is Gemini Pro. It performs very well in benchmarks. But when we integrated it into Bard, I could feel it as a user. We tested it and the popularity ratings increased quite a bit across all categories. That’s why we’re calling it one of our biggest upgrades yet. Even if we make blind comparisons, the better performance is clear. So these better models increase the benchmarks. So there is progress and we will continue to train them.

But I can’t wait to incorporate them into all of our products. These models are so powerful. Designing the products in such a way that they fully exploit the possibilities of the models – that will be exciting in the next few months.

The pressure to bring Gemini to market was probably enormous. What did you learn when you saw what happened after GPT-4 was released? What approaches have changed during this time?

One thing, at least to me: it doesn’t feel like a zero-sum game, does it? Consider how profound the shift to AI is and how early it is. There is a world of possibilities ahead of us.

But to address your specific question: It is a broad field in which we are all making progress. There is a scientific component, an academic component; there is a lot being published and we see how models like GPT-4 work in the real world. We learned from that.

Security is an important area. So at Gemini we have learned and improved security techniques based on how the models work in practice. This shows how important things like fine-tuning are. With Med-PaLM 2, we have shown, among other things, that such a model can outperform the most modern models by fine-tuning it to a specific area. In this way, we learned how powerful fine-tuning is.

Much of this is applied to the work on Gemini. One reason we chose Ultra [der fortschrittlicheren Version von Gemini, die im kommenden Jahr auf den Markt kommen wird] more time is that we test it thoroughly for safety. But we also fine-tune it to really exploit the possibilities.

To home page

“It was also believed that a computer could never fit in a trouser pocket”

Share this:

Related

MANTICORA – Single “Necropolitans” released

Avoid showering? This is behind the non-bathing trend

You may also like

Leave a Comment Cancel Reply