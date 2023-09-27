Listen to the audio version of the article

The evolutionary leap of the OpenAI chatbot is complete, becoming more and more similar to consumer applications such as Apple’s Siri or Amazon’s Alexa. As announced by OpenAI, it is now possible to add vocal input to conversations as well as images, which the system will use for an even more natural and effective interaction.

“Voice and image give you more ways to use ChatGPT in your life. Take a photo of a landmark as you travel and have a live conversation about what’s interesting. When you’re at home, take photos of your refrigerator and pantry to figure out what’s for dinner (and ask follow-up questions for a step-by-step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set and asking him to share suggestions with both of you,” suggests OpenAI in the article introducing the new features.

How the new feature works

The new capabilities of ChatGPT are already available on the app for Android and iPhone initially only in English, starting from users who have subscribed to Plus and Enterprise plans, with the intention of expanding the audience to other users in the future. The update will allow users to ask their questions to the chatbot verbally and hear it respond thanks to a speech synthesis function, with the possibility of choosing between five different voices. Interactions on the imagining and visual front are also simple: When you upload or take a photo from ChatGPT, the app will respond with a description of the image and contextual information, similar to Google Lens. The updated version of ChatGPT features an icon in the shape of headphones at the top right and icons that depict a photo and a camera in a menu that opens at the bottom left. Voice and visual functions convert incoming information into text, using voice or image recognition, allowing the chatbot to generate a response. The app then responds with voice or text, depending on which mode the user chooses. For OpenAI, ChatGPT’s new voice generation technology opens up new opportunities to license others to use its technology. Spotify, for example, is already using it for a function that translates podcasts into other languages ​​(currently only in Spanish and only on some selected podcasts), imitating the human voice thanks to artificial intelligence.

Doubts about privacy and other critical issues

The introduction of audio and visual features is the evolutionary step desired by developers to create an intelligence as similar as possible to human intelligence, providing algorithms with audio and visual as well as textual information. This, like many other recent advances in generative AI, raises legitimate concerns about how OpenAI will manage the flow of voice and image data coming from users. The company, which has already collected a vast amount of text-image data from the web to train its models like ChatGPT and Dall-E, with the imminent arrival of the boundless volume of voice requests and images sent by users, including potential photos of faces, one question remains open: will the company use photos and voice to expand the pool of data on which to train its algorithms? Presumably so and OpenAI, putting its hands forward, has already declared that users will be able to choose to avoid the use of their data for training purposes by activating a specific function in the app. In general, OpenAI has committed to guaranteeing a ethical and safe use of its technologies: “We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigation over time, preparing for more powerful systems in the future. A strategy that becomes even more important with advanced models involving voice and images,” OpenAI said.

