Home » “The computer talks like me”. Artificial intelligence perfectly reproduces voice, timbre and intonation

“The computer talks like me”. Artificial intelligence perfectly reproduces voice, timbre and intonation

by admin

The personal computer now has my voice. When I write something, in a window open on a dedicated web page, the computer repeats it and speaks as if it were me: my voice, my timbre, intonation. Identical. Sometimes in truth he gets lost, as if he were wandering through infinite space: it seems to listen to the noise of the stars in science fiction films. Or the echo of the wind in a steel canyon. But they are fractions of a second, barely perceptible; then I’ll come back. Or rather, my voice. My voice generated by an artificial intelligence system. It works. I could make a podcast out of it. I became a synthetic journalist.

I had an operation on my vocal cords a few days ago. Nothing serious: stress from exertion, the specialist had ruled. A year ago I conducted a live TV for eighteen hours and a small angioma had formed on a string, which seems to be who knows what, but it is only an edema, an effusion that was not reabsorbed. The result was that the voice had grown hoarse and vibrated, even if I were Clint Eastwood; that at first you say ‘nice’, but then you find it hard to talk. Surgery then; but first I decided to make a digital copy of the voice: to create a software capable of reproducing it in the few days when I have to be silent. To explore one of the frontiers where the human and the technological merge until they become indistinguishable.

On this journey there are some milestones that serve to orient yourself. The first is from 2014: Ian Goodfellow, who now works at Apple, is a young researcher of Brain, the division of Google that since 2010 has been involved in researching machine learning, the learning systems of machines. Here he creates a model that he had imagined in his doctoral thesis: he demonstrates that two artificial neural networks (simplifying a lot: two algorithms) “learn” in a kind of game in which they compete. This “antagonistic” process generates data that can also be apparently authentic videos, photos, sounds, texts. Thanks to this intuition, machines learn to produce human digital “objects”.

See also  Breast cancer, 13 thousand women could undergo genomic tests

The first sensational demonstration of this theory, and the second milestone, is in 2017: on a forum on the Reddit site a user uses the expression deepfake for the first time. It refers to videos starting to circulate where someone does or says something they have never done. In a video, former US President Obama is seen making a speech he never gave. Deepfakes immediately become a tool for making fake pornographic videos using the faces of famous actresses, a hateful practice that for a few days makes people say that this technology should be banned. It doesn’t happen. Indeed it becomes so sophisticated and widespread that many now enjoy creating faces of people who do not exist. Navigate the site thispersondoesnotexist it’s like reliving Blade Runner: who are humans and who are androids? Almost impossible to tell. For reasons that are difficult to understand, actor Tom Cruise becomes one of the favorite targets: on the net there are clips, admittedly fake but perfect, in which he makes funny and rambling gestures.

The third milestone is 2020: on 11 June GPT-3 is presented, an artificial intelligence capable of writing even a novel, they say, in the style of your favorite writer? Do you like Hemingway? Do you prefer Orwell? The Generative Pre-trained Transformer writes the story you want in seconds. Someone notes that apparently he remembers the automatic generator of love letters that Alan Turing experimented in 1952 in Manchester, but then it was a question of inserting nouns and adjectives, extracted from a list, in the appropriate blanks of an already written letter. Here we are obviously in another dimension.

The announcement causes a sensation: GPT-3 is the third version of a research project of a laboratory founded in San Francisco in 2015, Open AI, which among the founders is Elon Musk and Microsoft among the financiers. A serious thing. Just nine months have passed and GPT-3 does not write novels but is already used by over ten thousand developers and is present in over three hundred applications. A few days ago it was announced that it now generates four and a half billion words a day. Not random words, but words that form conversations made with human beings who probably do not know they are engaged in arguing with an artificial intelligence. And they don’t notice the difference. In the United States there is also a book, a “dialogue” between a Google researcher and GPT-3: it is not the first book ever “assembled” by an algorithm, but it is the first in which a human being dialogues with an intelligence artificial, creating a meaningful, at times, profound conversation. “Is this all real or am I talking to myself?” asks the researcher at the end.

See also  2001: A Space Odyssey is released, critics call it "immensely boring" and Kubrick cuts 17 minutes

The proof that that question is now meaningless came a few days ago. A Polish journalist and researcher Kazimierz Rajnerowicz has put online a fifty-question test that challenges people to recognize whether a certain image, face, sound, or text was generated by a human or artificial intelligence. Theoretically it should be easy: a computer generated face usually has imperfections on the neck or ears, a sound has unexpected tones, a text seems soulless although correct. Yet the results are disheartening: people get half of the answers right, or those who would guess if they answered randomly. Artificial intelligence has already won.

The question of sound brings us back to my artificial voice. Because the test involves recognizing short pieces of music: which one was composed by a computer? Hard to say. But reproducing the voice of a human being is another thing. Yet there we are. Last summer Open AI released a juke box that creates songs “sung” by music greats, usually gone. Frank Sinatra? Michael Jackson? Here they are again on stage. Mind you, the songs are not masterpieces at all, but they are demonstrations of strength: they tell us where the technology is coming.

Here we have arrived. To the point where reproducing someone’s voice is technically possible. With what sensible purpose? According to some, a market opens up: the world of dubbing could change forever. They are thinking about it in Seattle, where Amazon is based; and to Google which has just put an interesting demonstration tool online. And they work there in Rome. At the Pi Campus. Where the Eur ends and the road points south, on the right there are villas that house a center that is a bit of a startup accelerator and a bit of an artificial intelligence laboratory-school. Here several years ago Marco Trombetti with his wife founded Translated, perhaps the best Italian startup in circulation: a platform for professional translators supported by artificial intelligence to make better translations. Now he is about to launch MateDub, “the first tool for dubbing using voices produced by an artificial intelligence trained by listening to the best voice actors”.

See also  Lung cancer, Sybil algorithm arrives that can diagnose cancer years earlier with 94% accuracy

I went there a few days ago: for almost an hour (but the perfect product takes at least two hours) they made me read a text: that recording was entrusted to MateDub; after one day my voice was recognizable, after two it was good, after three it was still improving. The machine was learning to speak like me. The goal, says Trombetti, is not to replace voice actors just as Translated has not replaced translators, but it has changed the way they work. Think of a digital voice marketplace, a catalog in which audiovisual content producers can choose the best voice. And buy it.

We will see if this is really the case. But in the meantime we got to the point that a few days ago a video came out on the net, made by a youtuber, where you can see and hear rapper Eminem singing a song invented with an explicitly feminist text. Let’s summarize the process because it holds all the pieces together: the lyrics of the song were generated by ShortlyAI, an artificial intelligence based on GPT-3; it was enough to give the machine the title “Eminem’s new song is an attack on patriarchy. Takes a stand against males and in defense of women”; entitled GPT-3 he wrote the complete verses. The text was then sung to Eminem’s voice recreated by a youtuber who claims to do “synthetic parodies of songs”; finally, the rapper’s lips were synchronized in the video that has been circulating since mid-March. If one didn’t know Eminem, one might believe it.

.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Privacy & Cookies Policy