The photo is clear, realistic: there is a shiba inu dog, one of the coolest breeds in recent years, staring straight into the room, his gaze a bit bewildered. So far, nothing strange: any of the many dog pictures posted online continuous jet. There is a peculiarity, however: that dog, the shiba inu one, wears a cap, the French one, like a painter. And not only that: he is also wearing a turtleneck.
No, no dogs were abused to take this photo. And it didn’t happen because this image doesn’t exist: an artificial intelligence generated it. Dall-E 2 is the new project of OpenAI, the non-profit organization commissioned, among others, by Elon Muskwhich later came out in 2019, and Sam Altman, former architect of Gpt-3 (here an example of what it can do). It is an image generation system, starting from texts provided by human beings: the second generation of Dall-E, presented in January 2021.
Meta’s idea: improve AI by using AI to improve Wikipedia
by Emanuele Capone
30 March 2022
How Dall-E works 2
To keep it as short and simple as possible, Dall-E 2 receives a description as input. In our case, we would have written something like: “A shiba inu dog wearing a cap and a turtleneck“. The AI processes that information, understands it, somehow, and interprets it, output an image like the one at the top of this page.
Not only that: Dall-E 2 can also work starting from an image, to generate variations on the theme (as in the case of reworking of the Girl with a pearl earring); again: it can modify existing images starting from a textual indication, like a kind of Photoshop within everyone’s reach, guided by artificial intelligence.
The science behind image generation
At the base of Dall-E 2 there is a process of scientific research very profound, which had the aim of bringing artificial intelligence to take another step forward.
The system is based on a path that starts from two fundamental assumptions, to arrive at the ability to generate images. The first part is understanding: to create photos from text, AI must be able to understand how words and images relate among them. To do this, OpenAI has trained Dall-E 2 using a system called Clip (the acronym stands for Contrastive Learning-Image Pre-training). Clip trains two neural networks in parallel on images and related captions, extracted from all over the Web. The goal is to understand the characteristics of the two components and relate them: which part of the image corresponds to which part of the text? This is the question that Clip is called to answer.
Once trained, the system must then be able to create the images. For this purpose, OpenAI uses a technique called diffusion, which first transforms the text into data, to find similarities with what has been learned through Clip. What portion of the image is most likely to be related to the word “beach”? And which is the most suitable for the word “corgi”? AI looks for affinity and then turns that data back into something understandable: in this case, the image best suited to the text, as in the example below.
Can artificial intelligence be conscious?
by Andrea Daniele Signorelli
21 March 2022
The risks: from fake news to biases
From E 2 it’s not just funny photos (on Instagram there are a lot of them) and scientific research. It is also potential risks. First of all: if an image that seems true to all intents and purposes does not exist, how do you understand what is real and what is not? At the bottom right in the photos generated by Dall-E 2 you can see a small symbol, a sort of color palette: it is an indication that that image was generated by a computer system. From OpenAI they told us that “we believe it is important for people to know that a photo has been created by an artificial intelligence. Furthermore, we do not generate images of real people ”.
Furthermore, the availability of the system is limited. There is currently a very long waiting list to enter and only a few developers have been granted access. In the future, it is presumable that Dall-E 2 can become something similar to Gpt-3, that is a tool that will be made available to developers to generate dedicated applications. In short, for now managing this problem seems simple enough: later on, it may not be.
The management of the power of the system is not the only obstacle that OpenAI has to face: Dall-E 2, like many artificial intelligences, was born with a series of biases, especially as regards the representation of women and minorities. On Vox, the American journalist Sigal Samuel, called the system “a new demonstration of artificial intelligence biases”. In particular, the article refers to an image generation test for the keyword “Lawyer”, in which only white men appearor “flight attendant” in which, on the contrary, only white women are represented.