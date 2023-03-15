OpenAI introduces GPT-4: Language model now also understands images



GPT-4 is here: As Heise reported exclusively last week, the new generation of the AI ​​system has now appeared. GPT-4 is no longer a pure language model, but can also handle images in addition to text input. As indicated by the CTO of Microsoft Germany on March 9, 2023 at the digital kickoff event “AI in Focus” in front of business customers, it is actually a multimodal model that can handle different media – albeit with limitations, from text zu-Video is not yet mentioned in the OpenAI release. According to the current state of knowledge, GPT-4 is able to interpret more complex inputs than previously possible and at the same time to parse text and images.

More creative and complex contexts – and higher risk

According to OpenAI, the model should be more creative than the previous GPT 3 series and is probably more geared towards collaboration. In addition to text input, it should also be able to process visual input – however, it can apparently only respond in text form and not in images. The amount of text has been expanded: according to the announcement, GPT-4 is able to process and generate text of up to 25,000 characters. The existing problems that were known from ChatGPT are not resolved: the model continues to tend to confabulate and not always answer truthfully.

“What can I make with these ingredients?” – in response, GPT-4 suggests possible dishes that can be made from eggs, flour, butter, and milk. Combined text and image prompt as input (input), the response (output) is in text form. (Image: OpenAI)

According to OpenAI, the model should be able to perform creative and technical writing tasks, compose song lyrics, write screenplays or imitate the style of its users. The ability to generate violent or otherwise harmful content is apparently not banned. According to the website, GPT-4 will be available in the GPT-4 Plus paid plan and as an API for developers to build their own applications and services (there is a waiting list for API access).

What is now known about GPT-4

Sam Altman, the CEO of OpenAI, pointed out that the now published version of GPT-4 differs only very slightly from GPT-3.5 in terms of conversational ability. GPT-3.5 is familiar to most users as it is the model behind ChatGPT’s chat interface. For over a year, the AI ​​scene had been speculating on what architecture GPT-4 might have, and Altman himself dampened expectations in an interview with StrictlyVC in January 2023. After the hype, the public will inevitably be disappointed. It is not yet an breaking latest news – i.e. no general artificial intelligence on a human level.

In internal tests, GPT-4 is said to have a significantly lower probability than its predecessor models of generating unwanted content (reduced by 82 percent according to OpenAI) and have a 40 percent higher hit rate for facts than GPT-3.5, i.e. the known one Version behind ChatGPT. In common performance comparison tests, it has apparently surpassed ChatGPT and consistently performed better: GPT-4 is said to be in the top instead of in the bottom ten percent of graduates in a simulated bar test (a final legal exam, comparable to a state exam at the end of law studies).

Technical research report and safety trainings

According to the GPT-4 blog entry, the OpenAI team had trained on “AzureAI supercomputers”. According to the announcement, GPT-4 went through security training for six months and is said to have been readjusted for desired behavior through human feedback in reinforcement learning. A technical research report is available on the OpenAI website. As a result, the architecture of the model is the same as its predecessors, a pre-trained transformer model that statistically predicts the next few words and thus generates its outputs. The model should continue to learn during use. More about the research work for the model can be found in a separate blog post by the research team.

GPT-4 is said to outperform existing language models in most NLP tasks and at least be able to compete with “the vast majority of known SOTA systems” (SOTA stands for state-of-the-art, i.e. the most powerful currently available AI systems, including others Offerer). In the course of the release, OpenAI also disclosed some pilot customers who are already using GPT-4: The government of Iceland (to preserve their own language, as the blog entry says), the language learning app Duolingo, Stripe and the asset management of the major bank Morgan Stanley.

The new Bing runs on GPT-4

In the course of the announcement, Microsoft also announced that the new Bing was already using GPT-4. The assumption had already been circulating in the AI ​​scene, since Microsoft had kept a low profile on the model version used. Microsoft recently had to limit its AI-assisted search to a limited number of search queries per IP address and day in order to avoid gaffes. Seen in this way, there are also initial user experiences with the increased creativity of the new model, according to OpenAI, which manifested itself in Microsoft’s Bing, above all, in increased “emotionality” in longer conversations and increased use of emojis.

In the technical report, the OpenAI team also warns that GPT-4 “poses new risks due to the increased capabilities” – the conclusion is silent on exactly what and how OpenAI intends to hedge them. There is still a lot to be done and GPT-4 is a significant step on the way to widely deployable and secure AI systems. Further information can be found in the OpenAI release notification.

