Shan Tianfang’s voice AI reproduction series album goes live in Himalaya_Dan Lao

Original title: Shan Tianfang’s voice AI reproduction series album is online in Himalayas

Recently, under the authorization of Beijing Dan Tianfang Art Communication Co., Ltd., Himalayas used the text-to-speech (TS: Text-to-speech) technology to perfectly restore the voice of Mr. Shan Tianfang, and for the first time applied Mr. Shan Tianfang’s AI synthesis sound In six books of different styles, with a single storytelling tone, a new interpretation of the classics familiar to the audience. Shan Ruilin, the son of Mr. Shan Tianfang, commented, “Hearing the voice of TTS, there was a sudden burst in my heart and soul, as if my father was back in this world.”

Book last time

Mr. Shan Tianfang is a famous Chinese storytelling art performer and the inheritor of national intangible cultural heritage. Since he has been in the art for more than half a century, he has recorded and broadcasted more than 100 episodes of radio and TV storytelling including “The Romance of the Sui and Tang Dynasties”, “The Three Heroes and Five Righteousnesses”, “The Hero in the Troubled Times”, and “The White-Browed Heroes”. It has been broadcasted on more than 500 radio and television stations across the country, with a program time of about 6,000 hours, and 17 sets of 28 traditional storytelling manuscripts have been compiled.

Shan Tianfang’s storytelling has become an important symbol of Chinese traditional culture. His book fans are all over the country, from the elderly to the children, and they all like his storytelling. There is even a saying among the folks that “where there is a well, you will listen to Shan Tianfang.” Even today, if you take a taxi in the north, the driver may still be listening to his storytelling program.

In the “Single Tianfang Voice AI Reproduction Album” released this time, there are not only martial arts novels that hit the tears, strange people and strange things, and make life sorrows and joys—Zhao Chenguang’s “History of Jianghu Demise: Beiping Dark Night”; there are also transmissions. The pulsating documentary literature-Chen Tingyi’s “The Three Brothers of Mao: Three Brothers and the Republican Foundation”; and the popular nowadays mystery novel with a peculiar plot of twists and turns-Zijin Chen’s “Undocumented Crimes”; more continuation list The unfinished classic storytelling in his lifetime-Gong Baiyu’s “Twelve Money Darts”…

The cooperation between Himalaya and Beijing Shan Tianfang Art Communication Co., Ltd. has a long history. Himalaya has put on the shelves more than 80 Mr. Shan Tianfang storytelling albums, including more than 5,000 voices. These albums have always been loved by users in the Himalayas, and many storytelling albums have long been in the forefront of the Himalayas comic storytelling charts. For example, “Chaotic Times” has been played in the Himalayas as high as 2.36 billion, and “White Eyed Heroes” has been played in the Himalayas. Up to 1.97 billion.

In order to pay tribute to the old single and inherit the culture, Himalaya has also launched the “Book Take-Up-New Storytelling Inheritance Project”, hoping to allow more and more storytelling fans and young storytelling actors to participate in the creation of new storytelling. , To enrich and inherit the intangible cultural heritage of storytelling.

Perfect reproduction

After three years, I was able to hear the iconic “cloud covering the moon” voice again, thanks to the painstaking exploration and development of Mr. Shan Tianfang’s voice by the Himalaya Intelligent Speech Laboratory. In order to retain Shan Lao’s vigorous, hoarse unique voice and emotional storytelling tone to the greatest extent, Himalaya Intelligent Speech Lab has done a lot of work.

Himalaya Intelligent Speech Laboratory has long focused on the research and development of speech synthesis, recognition, speech signal processing, coding and decoding, and intelligent sound effects. It is the core department of Himalaya.

In order to reproduce Shan Lao’s voice and pay tribute to traditional art, Himalaya’s Intelligent Speech Lab not only perfectly reproduces Shan Lao’s vigorous and hoarse “cloud covering the moon” unique voice, but also his emotional, ups and downs tone. It was also retained. When the AI-synthesized voice, which is very similar to Mr. Shan Tianfang’s own voice and close to the deity, spoke of storytelling naturally and fluently, the former storyteller who could not stop with just one mouth seemed to come back to us again.

Himalaya also invited professional sound effects engineers to add soundtracks and sound effects to each “single new work”, so that listeners can get an immersive and immersive experience through their ears. The blessing of senior sound designer has made the world in Shan Lao’s mouth more three-dimensional and vivid.

Compared with ordinary synthetic audio, there are many scene descriptions and different expressions of emotions in storytelling. In particular, Mr. Shan Tianfang is good at using voice to shape characters. The rhythms in his storytelling are very fluctuating. There are also many colloquial pronunciations, which are quite different from the pronunciation of Mandarin. For example, the word “这” in “this” is pronounced “zhè” in Mandarin, but it is usually pronounced “zhèi” in storytelling. If only the current mainstream TTS framework model is used for extraction and synthesis, the final overall feelings and emotions of the synthetic storytelling will be very flat, without the ups and downs of the original.

In order to solve this problem, Himalaya Intelligent Speech Lab independently designed a separate prosody extraction module and integrated it into the HiTTS technical framework. This means that no matter how rich and changeable the rhythms in Mr. Shan Tianfang’s storytelling are, they can be extracted and completely reproduced, making Shan Lao’s AI synthesized sounds seem to be reproduced as the deity. On the other hand, in response to the pronunciation of Shan Lao Pingshu that is different from standard Mandarin, the team also pioneered the design of an accent module and annotated these special pronunciations, so that Shan Lao’s AI synthesized sounds can restore the original taste of the old.

In this way, the original Shan Tianfang’s “voice” reappeared.

Sound imagination

The perfect reproduction of Shan Tianfang’s “voice” is not accidental. Himalaya has devoted itself to the field of TTS for many years. TTS technology will help Himalaya to further expand the possibilities of AIGC in addition to the existing “UGC + PGC + PUGC” content ecology.

Dr. Lu Heng of Himalaya Intelligent Speech Laboratory said that the TTS system and tone selection for novels are the highlights and features of Himalaya TTS. It is very difficult to interpret audio novels with real and natural TTS timbre. Unlike ordinary text-to-speech, interpreting novels with TTS timbre requires learning the circumflex, emotional expression, and context in the novel, distinguishing between narration and dialogue, and finally perfecting the work. Deduct it out. “Himalayan has a natural advantage in this regard. After years of hard work on the audio track, Himalaya has gathered a large amount of audiobook content and many excellent anchors. The Himalaya Intelligent Speech Lab tries to use various voices to express different emotions, themes and Channel, so there is more room for experimentation and play.”

According to Dr. Lu Heng, the TTS front-end text processing and analysis module developed by Himalaya has been able to perform multi-sound character recognition, prosody prediction and style classification on text with high precision and full automation, and has developed a multi-emotion, multi-style, multi-language The TTS technical model of voice can not only interpret texts with different emotions, but also automatically distinguish narration and dialogue, and support English, which greatly enriches the emotion and rhythm that TTS can express. Himalaya has applied for three TTS speech synthesis related patents, including a technical framework that enables TTS voices without any English raw data to speak English. For example, Himalaya technology can already speak English with Mr. Shan Tianfang’s “voice”.

At present, Himalaya has used TTS in the production of a variety of content, helping creators to lay out audio and transform and upgrade. For example, the "Whale Express" album released by Himalaya and the Beijing News has ranked first on the new Himalayan news album list for several consecutive weeks. For users, the application of TTS technology will bring them richer and better content. Himalaya will continue to open up the imagination of sound, let technology bless sound and let sound serve life.


Disclaimer: The opinions of this article only represent the author himself. Sohu is an information publishing platform. Sohu only provides information storage space services.


