OpenAI releases GPT-4: able to recognize pictures and calculate taxes, ChatGPT gets rid of Chat and evolves again

GPT-4 outperforms the vast majority of humans in many specialized tests. GPT-4 scores in the top 10% or so of test takers on the mock bar exam, in the top 7% or so on the SAT Reading test, and in the top 11% or so on the SAT Math test.

Just over four months after the release, after ChatGPT showed amazing strength, OpenAI dropped another nuclear bomb:

GPT-4 was released.

In today’s blog post, OpenAI wrote:

We created GPT-4, the latest milestone in OpenAI’s effort to scale deep learning.GPT-4 is a large multimodal model (accepts image and text input, provides text output) that, while inferior to humans in many real-world scenarios, performs at a human level on various professional and academic benchmarks.

It seems to be similar to the previous generation? Rest assured, this is OpenAI being humble.

In the subsequent Live Demo on YouTube, OpenAI’s president and co-founder Greg Brockman demonstrated the true strength of GPT-4 – summarizing articles, writing code, filing taxes, writing poems… What GPT-3.5 can’t do, GPT -4 easily won.

But that’s just scratching the surface, GPT has evolved once again, although maybe not in the way you think.

New Model: Iterative Optimization

How to prove that one person is better than another? take an exam.

So how do you prove that one AI model is better than another? The same is the exam.

OpenAI let GPT-4 take many common human tests, and it turns out that its performance in many tests and benchmarks is indeed greatly improved compared with the previous generation:

According to their test results, the SAT score of GPT-4 has increased by 150 points, and can now get 1410 points out of 1600 points;

It can pass mock bar exams with scores in the top 10% or so of test takers, compared to GPT-3.5 scores in the bottom 10% or so;

On the SAT Reading test and the SAT Math test, GPT-4 scores can achieve leading rankings…

“We spent 6 months iteratively tweaking GPT-4 using our adversarial testing procedures, and lessons learned from ChatGPT, to achieve historic results in terms of authenticity, manipulability, and rejection beyond set limits. Best result ever,” OpenAI said.

“Our GPT-4 training run was (at least for us!) more stable than ever, becoming the first large model for which we were able to accurately predict its training performance in advance.”

In addition, GPT-4 has made a qualitative leap-it can start processing images.

Anyone who uses ChatGPT regularly knows that it can only handle text, but GPT-4 began to accept images as input media.

In an example provided by OpenAI, GPT-4 accurately answered the question of why several Internet memes are funny (although the explanation is not funny).

^{Source: The New York Times}

In the case provided by The New York Times, it can also be seen that GPT-4 can parse both text and images, which also allows it to interpret more complex information. However, the permission of the image input has not been made public at present, so there are no more examples to prove the processing ability of GPT-4 in image.

In the subsequent Live Demo, OpenAI also stated that it has not publicly provided this part of the technology, but it is already cooperating with a company called Be My Eyes, which will use GPT-4 to build services.

Also, GPT-4 has started to have a little sense of humor. It can already tell some stereotyped, poor-quality bad jokes—but, at least, it has begun to understand the human trait of “humor”.

^{Source: The New York Times}

Of course, more aspects,GPT-4 improvements are iterative. In casual conversation, the difference between GPT-3.5 and GPT-4 can be subtle. But when the complexity of the task reaches a sufficient threshold, differences emerge—GPT-4 is more reliable, creative, and able to handle finer-grained instructions than GPT-3.5, allowing it to solve difficult problems more accurately.

For example, Anil Gehi, an associate professor of medicine and cardiologist at the University of North Carolina at Chapel Hill, described to GPT-4 the medical history of a patient he had seen a day earlier, including postoperative complications and the patient being taken to the hospital. The description contains several medical terms that are not recognizable to the layman.

When Dr. Gehi asked how GPT-4 should treat patients, GPT-4 gave him the perfect answer. “That’s exactly how we treat our patients,” Dr. Gehi said. When he tried other scenarios, GPT-4 gave equally impressive answers.

Of course, the other good news is thatGPT-4 has also been greatly optimized for languages other than English.

Many existing machine learning benchmarks are written in English. In order to get a preliminary understanding of GPT-4’s capabilities in other languages, OpenAI used Azure Translate to translate a set of MMLU benchmarks covering 57 topics with 14,000 multiple-choice questions into multiple languages, and then tested them.

In 24 of the 26 languages tested, GPT-4 outperformed GPT-3.5 and other large language models in English language performance.

Among them, Chinese has reached an accuracy of 80.1%, while the accuracy of GPT-3.5 in English is 70.1%. That is to say, in this test, GPT-4’s understanding of Chinese language is better than that of ChatGPT’s understanding of English. .

Live Demo: Do tax returns, write poetry, write code, do everything

If talking about these data and cases, it seems difficult for people to intuitively feel the true strength of GPT-4, then Greg Brockman, the president and co-founder of OpenAI, personally conducted a live broadcast on YouTube to demonstrate the Live Demo in real time , showing the true strength of GPT-4 – summarizing articles, writing codes, filing taxes, writing poems… What GPT-3.5 can’t do, GPT-4 can easily win.

Greg Brockman showed the new user interface of GPT-4. On the left is the system box, which can specify the role of AI and the overall answering principle. In the middle is a dialog box, where you can enter specific dialogue forms to adjust specific content , ask questions, or give feedback. On the far right are some parameter settings.

In the demonstration, Brockman used the “System” box on the left to make GPT-4 successively become “ChatGPT”, “AI Programming Assistant”, and “TaxGPT” to solve different problems.

In ChatGPT mode, GPT-4 can handle more than 25,000 words of text, and can easily summarize the core content of a super long article, such as summarizing the main points of this great article about GPT-4 released by OpenAI today.

It can even be distilled in all sorts of weird ways, as in the demo — for instance, by words that all start with the letter “G”.

Or ask it to write these main points into a poem.

In the “AI programming assistant” mode, you can also make it easy to write code, generate a website, or more complicated, write a Discord-based robot, if there is an error, such as calling a relatively new API and making an error, it will not even You need to explain it to it, but copy the error code into it, and it will automatically correct the error and generate a new code.

Or you want it to be transformed into TaxGPT, and you need it to calculate how much tax a couple has to pay based on tax rules, and write out the calculation reasons step by step for people to review.

Greg Brockman greatly appreciated the professional ability demonstrated by GPT-4. He said that he had read the tax document for half an hour and still didn’t understand it, but GPT-4 could give the answer quickly.

Perhaps this less than an hour’s demonstration really revealed the power of GPT-4-it is no longer just a “chat robot” for ordinary users, butIt will become a sharp tool in the hands of developers, and become the cornerstone of powerful tool development in text, programming, taxation and more imaginable fields.

From this point of view, it will have a wider impact than ChatGPT.

Mouth Running Train: Still Running, But Better

It has to be mentioned that despite being powerful, GPT-4 has similar limitations to earlier GPT models. On top of that, it’s still not entirely reliable — it still makes up facts with audacity and confidence, and is subject to errors in reasoning. OpenAI emphasizes that it is still recommended to add such as human review or additional context when using it, and even in high-risk situations, avoid using it.

In the GPT-4 announcement, OpenAI emphasized that the system has undergone six months of security training, and in an internal adversarial authenticity assessment, GPT-4 scored higher than the latest GPT-3.5: “Responding to requests for disallowed content 82% less likely and 40% more likely to generate ground truth, outperforming GPT-3.5.”

This also means that Compared with previous models, GPT-4 still significantly reduces the frequency of serious nonsense, and the success rate of users trying to prompt it to let it say banned content is much smaller.

However, that doesn’t mean the system won’t make mistakes or output harmful content. For example, Microsoft revealed that its Bing chatbot has always been powered by GPT-4, but many users have been able to break Bing’s guardrails in various creative ways, allowing the bot to provide dangerous advice, threaten users, and fabricate information.

In addition, GPT-4 is still trained based on data before September 2021, which also means that it, like the previous generation, still lacks an effective understanding of data after September 2021.

“GPT-4 still has many known limitations that we are working to address, such as social bias, hallucinations, and adversarial cues,” OpenAI said.

Application: for developers, more and more expensive

Of course, one obvious difference aside from the performance aspect is that ChatGPT-4 is “bigger” than previous versions, meaning it has been trained on more data and therefore more expensive to run. OpenAI said only that it used Microsoft Azure to train the model, but did not release details about the exact model size or the hardware used to train it.

This also means that the cost of using it is getting higher and higher.The difference from ChatGPT is thatthis new model is not currently available for free public testing, although it encourages developers to apply for a trial, but will need to be on a waiting list。

The new model will be available to ChatGPT’s paying subscribers, ChatGPT Plus ($20 per month), and will also be available as part of an API that allows developers to pay to integrate AI into their applications. OpenAI said that several companies have integrated GPT-4 into their products, including Duolingo, Stripe and Khan Academy.

Of course, if you’re not a developer or a paying customer, but you really want to try something new, Microsoft’s Bing will be your best bet – Bing’s AI chatbot has been using GPT-4 for the past six weeks.

Did you feel it?

The author of this article: VickyXiao, source: Silicon Star, the original title: “OpenAI releases GPT-4: can recognize pictures and calculate taxes, ChatGPT gets rid of Chat and evolves again”

Risk Warning and Disclaimer

Market risk, the investment need to be cautious. This article does not constitute personal investment advice, nor does it take into account the particular investment objectives, financial situation or needs of individual users. Users should consider whether any opinions, opinions or conclusions expressed herein are applicable to their particular situation. Invest accordingly at your own risk.

OpenAI releases GPT-4: able to recognize pictures and calculate taxes, ChatGPT gets rid of Chat and evolves again – yqqlm

New Model: Iterative Optimization

Live Demo: Do ​​tax returns, write poetry, write code, do everything

Mouth Running Train: Still Running, But Better

Application: for developers, more and more expensive

Share this:

Related

Belgium demolishes house of child killer Marc Dutroux

“Healthy carriers should not take antibiotics”

You may also like

Leave a Comment Cancel Reply

Live Demo: Do tax returns, write poetry, write code, do everything