When it comes to “AI music”, what picture can you think of? Robot singing? Automatically generate songs?
In fact, AI music is nothing new. The use of artificial intelligence to create music dates back to the 1950s. In 1951, Alan Turing, the “father of artificial intelligence”, tried to generate music by computer and record it. In 2016, Flow Machines, a large database of songs and styles launched by Sony’s computer science experiments, created “Beatles”-style melodies.
In the early years, AI music creation mainly focused on pure music and ambient music. So in the past two years, AI’s one-stop “musical talent” in lyrics, arrangement, accompaniment, and singing are all eye-opening.
Starting from the first generation of domestic virtual singer “Luo Tianyi”, artificial intelligence has risen in more and more waves in the field of music: musician Taryn Southern has collaborated with artificial intelligence to create albums, Microsoft has created an album that can arrange, write lyrics, and sing. “Smart girl” Xiaobing, South Korea’s SM company launched aespa, a girl group with an AI concept, and NetEase released an AI original single… The popularity of “AI + music” continues to heat up.
AI song synthesis
With the continuous breakthrough of AI music applications, the singing synthesis technology behind it has also attracted more and more attention. Compared with speech synthesis, singing voice synthesis has both similarities and particularities.
Singing synthesis technology is a technology that converts musical score information and lyrics into singing voice. Compared with speech synthesis (TTS), music is streaming media, which requires higher fluency of the work. Therefore, song synthesis requires the input of more dimensional music annotation information (pitch, time value of notes), and at the same time requires the output song to have more emotional ups and downs, and the sound presentation must be continuous, and the technical implementation is more complicated.
Like human learning, the AI song synthesis system also needs to learn massive song content through the corpus in order to “create” new musical works. Ordinary students need at least a week to learn a song, but with the blessing of AI technology, high-quality song data is input into the algorithm for model training, and it only takes a short time to complete the creation of a song.
High-quality database resolves difficulties in AI song synthesis
Despite the continuous advancement of technology, there are still two major difficulties in song synthesis.
One is the limitation of the vocal range. Since each singer has their own vocal range, for songs beyond the vocal range, the AI singing synthesis effect will be affected to a certain extent.
Second, there is the need for high-quality data. In the processing of sound data, singing contains more professional factors than speech. The changes and combinations of pitch, intensity, and length of the singing voice are complicated, and it is necessary for the labelers to have a deep understanding of the music, in order to carry out the recording according to the pitch, melody rhythm, singing skills, lyrics and other content contained in the recorded vocal singing. Fine labeling.
The quality of the synthesized singing depends to a large extent on the quality of the database. How to use less data to make the song synthesis effect more realistic and natural, and obtain a more stable experience effect, is the key point of AI song synthesis technology breakthrough.
Biaobei Technology has been deeply involved in the field of AI energy data services for many years, and has accumulated advanced technical capabilities and a large pool of music talents in the production of voice data. Facing the more complex singing voice synthesis requirements, Biaobei Technology has designed a set of professional data processing procedures, which can quickly produce high-quality singing data of different timbres and styles.
At present, Biaobei Technology has a database of nearly 5,000 Chinese songs of different styles. The entire data collection is completed by trained professional singers, and Biaobei Technology supervises and guides the whole process to ensure the quality of the data.
In terms of the annotation format of audio annotation, Biaobei Technology adopts the MusicXML format with strong compatibility and high accuracy in recording music information, and records the musical score attributes such as time value, sharp and sharp, beat, and clef.
At the same time, in order to reduce the error rate of labeling, Biaobei has done a lot of work on the distinction and recording of notes related to accents, pauses, falsetto, and tuplets. There are even special identification and marking marks for the handling of the more difficult vibrato.
As shown in the figure above, Biaobei Technology provides the pitch rhythm, inversion, gas port, rest, lyrics information and corresponding pinyin information corresponding to the melody note during the labeling process.
Industry partners who are interested in the above song synthesis data are welcome to contact us~Return to Sohu, see more
Disclaimer: The opinions of this article only represent the author himself, Sohu is an information publishing platform, and Sohu only provides information storage space services.