After AI generates articles, programs, and pictures, the next object of generative AI is back to text-to-speech (TTS).Although there are already many synthetic speech software and services in the market, they are not very mechanized (such as this page ), that is, a fee is required. A recent titled “🐶 Bark 》The artificial intelligence model is launched, not only has a personalized tone, but can even ask AI to sing (of course, don’t expect too much 😁). The following will teach you to install Bark on your computer, and also introduce how to break through the 15-second limit and improve the tone.

https://file.notion.so/f/s/2c5099b4-dcdf-4fee-b5bf-6b5066f52a0b/bark.wav?id=6e8e3833-fe08-44e3-a22c-76d3075d2627&table=block&spaceId=a334d0e7-d31c-45c2-8365-ebe3f941e04d&expirationTimestamp=1682958507359&signature=Zpj80XWliz-slxbimLe27Pe-xz6BIxFVuN2iq9VPQ1Y One of the official examples that can generate this hesitant tone is Bark’s specialty.

🐶 Bark Developed by the Suno team, it is based on the GPT model (a converter-based generative pre-training model) to generate speech. It is different from other “script-reading” speech generation in that it contains creative components and can generate human-like tone, mixed Multilingual reading, including music, background noise, and simple sound effects, and expressions such as laughter, sighing, and crying can also be mixed in the voice.

Bark is particularly good at mixing multiple languages ​​and currently supports 13 languages, including English, German, Spanish, French, Hindi, Italian, Japanese, Korean, Polish, Portuguese, Russian, Turkish and Mandarin, Arabic, Bengali, and Telugu will also be supported in the future. Unfortunately, Bark does not support Cantonese.

Originally, Bark also had the function of copying the voice, but due to concerns that AI would be used as a fraud, Suno restricted the use of only the official low-quality voice.

If you want to try Bark’s voice generation effect, you can go to Hugging Face Spaces or Google CoLab try out. However, for the fastest and most stable generation, it is best to install Bark on your own computer, as long as you have an NVIDIA graphics card with 4GB of VRAM, it will work smoothly.

Getting started: One-click installation of WebUI

To install “Bark” on your computer, the easiest way is to install Fictiverse 《Bark Web UI》 ,Enter The Github Release page for the Bark Web UI Then click the latest version of “► Assets” to open the downloadable file, and download the “Bark_WebUI.7z” file.

“Bark Web UI” one-click installer download page: click here

Create a new folder on the computer and put “Bark_WebUI.7z” into it, decompress the file with 7-Zip and enter the decompressed “Bark_WebUI” folder, execute the run.bat The installation will begin.

After installing the Miniconda virtual execution environment, you will be asked if you have an NVIDIA graphics card, and you can answer if you have one. y and then Pytorch and related packages responsible for artificial intelligence computing will be installed.

Once done, just execute again run.bat in the command line window press Ctrl to click on the intranet URL http://127.0.0.1:7860 you can activate the Bark Web UI.

Instructions

To generate speech, simply enter the prompt to enter the dialogue, press Launch Click the button and the voice file will start to be generated. The first execution will take a little longer to load the model.Once complete it will be in the Result area to see the audio player.

Since speech content is generated with audio.wav for the file name stored in Bark_WebUI folder, so if you want to save the voice file, remember to download it first by pressing “⋮” on the right side of the player.

https://plugmedia-wp-uploads.s3.ap-southeast-1.amazonaws.com/wp-content/uploads/2023/05/bark_zh_01.wav Speech results generated in Mandarin

Bark provides 10 virtual speakers for each language in 13 languages, together with the preset speaker announcer, there are 131 people in total. Although they have accents from their respective regions, they all understand multiple languages. They can speak English when they input English, and they can speak Japanese when they input Japanese. There is no problem in mixing them.

The only thing to pay attention to is Chinese. The Chinese supported by Bark refers to Simplified Chinese. Inputting Simplified Chinese will improve the reading accuracy a lot. In addition, when entering Chinese, Japanese, and Korean dialogues, you should also pay attention to adding spaces between sentences, so that the program does not know how to divide sentences.

As for the tone and singing, just insert the following Metatags where appropriate to prompt the dialogue, including:

[laughter] :laughing out loud

:laughing out loud [laughs] :smile

:smile [sighs] :sigh

:sigh [music] :music

:music [gasps] :pant

:pant [clears throat] : clear throat

: clear throat - or … : hesitate

or : hesitate by ♪ Clip the lyrics before and after and it will be sung (leave a space with the lyrics)

Clip the lyrics before and after and it will be sung (leave a space with the lyrics) Full-scale characters: emphatic tone

MAN / WOMAN : Respond with two voice lines in the same prompt sentence

Advanced: Bark Infinity Breaks Limits

Although the Bark Web UI is convenient, it is limited to a maximum of 15 seconds of voice generation, and it cannot control the save file name, use less VRAM, and control the generation temperature.Fortunately another developer JonathanFly developed a Bark package “🚀 Bark Infinity 》, although it needs to be entered in a text string, it not only breaks through the above limitations, but also provides 39 additional speakers with different purposes for everyone to use.

Advance preparation

Before installing Bark Infinity, 3 programs must be pre-installed: git 、 Python 3.10.x with and CUDA Toolkit 11.7 or above .Please see first Stable Diffusion Teaching to learn how to install them.

installation steps

Create a folder (named “Bark_Infinity” in this example);

Go to the “Bark_Infinity” folder and enter in the path bar cmd Open a new command line window;

enter git clone https://github.com/JonathanFly/bark.git pending download;

enter cd bark Go to the downloaded folder;

enter pip install . Install the necessary modules;

enter pip install soundfile Install voice files.

Instructions

The basic syntax of Bark Infinity is this:

python bark_perform.py --text_prompt "對白" --split_by_words 5 --filename "輸出檔名.wav"

As long as you enter your own content in the “dialogue” and execute it, you can generate the most basic voice.And the parameters after “dialogue” can be fine-tuned (please refer to the complete parameter table here ）：

--list_speakers : Lists all available speaker/singer codes.

: Lists all available speaker/singer codes. --history_prompt "講者代碼" : Select the designated speaker/singer.

: Select the designated speaker/singer. --text_temp 0.7 : Dialogue generation temperature, the value is a decimal between 0-1, the higher the temperature, the greater the “creativity” of artificial intelligence, and there may be unexpected surprises (or shocks). The default value is 0.7.

: Dialogue generation temperature, the value is a decimal between 0-1, the higher the temperature, the greater the “creativity” of artificial intelligence, and there may be unexpected surprises (or shocks). The default value is 0.7. --waveform_temp 0.7 : Waveform generated heat, the numerical meaning is the same as dialogue temperature.

: Waveform generated heat, the numerical meaning is the same as dialogue temperature. --filename "檔名.wav" : The name of the file to save the voice result, the default is stored in Bark_Infinitybarkbark_samples folder.

: The name of the file to save the voice result, the default is stored in folder. output_dir "文件夾路徑" : Choose another folder to save voice results.

: Choose another folder to save voice results. --split_by_words 5 : In order to break through the 15-second limit, the dialogue is divided into small segments according to the specified number of words. It should be noted that since the system is mainly in English, words are naturally separated by blanks, and it is better to set it to 35; when using Chinese, Japanese and Korean dialogues, it is necessary to add spaces to separate sentences, and it is more appropriate to set it to 5.

: In order to break through the 15-second limit, the dialogue is divided into small segments according to the specified number of words. It should be noted that since the system is mainly in English, words are naturally separated by blanks, and it is better to set it to 35; when using Chinese, Japanese and Korean dialogues, it is necessary to add spaces to separate sentences, and it is more appropriate to set it to 5. --use_smaller_models : Use less VRAM, suitable for lower-end graphics cards.

We just generated some examples with Bark Infinity:

25 seconds Mandarin reading: Introduction to “PCM: Self-built Stable Diffusion WebUI Image Generation Platform”

python bark_perform.py --text_prompt "图像生成是近月 AIGC 其中一个热门课题， 很多人都会利用 Discord 或者网上提供的服务， 不过要不是要轮候， 就是要付费， 有时候更会有内容限制， 玩起来不够爽。其实只要您有一部游戏电脑， 要在家中自建图像生成平台不是什么难事！" --split_by_words 5 --history_prompt "zh_speaker_0" --filename "zh_test_20230430_0417.wav"

https://plugmedia-wp-uploads.s3.ap-southeast-1.amazonaws.com/wp-content/uploads/2023/05/zh_test_20230430_0438.wav Note that there is no sudden change in the intonation of some parts that are read into English.

42 seconds Japanese reading:[My Child]Synopsis

python bark_perform.py --text_prompt "田舎の産婦人科医・ゴローは、 自分に懐いていた患者で、 12歳の若さで亡くなった少女さりなの影響により アイドルオタクになっていた。 そんな彼の元に、 活動休止中の彼の推しアイドル・ 星野アイが双子を妊娠した状態で現れる。 子供を産むこともアイドル活動も諦めないというアイに 改めて魅力を感じ、 全力で応援することにしたゴローは、 彼女の主治医としてつきそう。 だがアイの出産日、 ゴローはアイのストーカーのリョースケによって 殺されてしまう。" --split_by_words 5 --history_prompt "ja_speaker_0" --text_temp 0.3 --filename "ja_test_20230430_0452.wav"

https://plugmedia-wp-uploads.s3.ap-southeast-1.amazonaws.com/wp-content/uploads/2023/05/ja_test_20230430_0452.wav With the help of Bark Infinity, long texts can also be speech-generated, but Japanese somehow all have Western accents.

English-speaking mixed Chinese-English content: Dialogue from “Gone with the Wind”

python bark_perform.py --text_prompt "土地是 the only thing in the world worth WORKING for, worth FIGHTING for, worth DYING for. 因为它是唯一永恒的东西" --history_prompt "en_british" --filename "en_test_20230430_2126.wav"

https://plugmedia-wp-uploads.s3.ap-southeast-1.amazonaws.com/wp-content/uploads/2023/05/en_test_20230430_2126.wav Think Mandarin sentences still have foreign accents

K-Pop AI singer interprets the first episode of YOASOBI’s “Idol”

python bark_perform.py --text_prompt "♪ 無敵の笑顔で荒らすメディア、 知りたいその秘密ミステリアス、 抜けてるとこさえ彼女のエリア、 完璧で嘘つきな君は、 天才的なアイドル様 ♪" --split_by_words 3 --history_prompt "kpop_acoustic" --text_temp 1 --waveform_temp 1 --filename "ja_song_test_20230430_1818.wav"

https://plugmedia-wp-uploads.s3.ap-southeast-1.amazonaws.com/wp-content/uploads/2023/05/ja_song_test_20230430_1745.wav Lyrics that are interpreted by AI themselves are always quite weird…

Tips: Improve Your Tone See Adobe Podcast

The timbre of the voice generated by Bark is poor, and there will be some noise. If you want to improve the timbre, you can use the free feature “Speech Enhancement” provided by the Adobe Podcast Beta version.

Adobe Podcast Speech Enhancement URL: click here

You only need to register an Adobe ID to log in, and then drag and drop the generated voice file to the drag and drop area of ​​the web page, and a version with enhanced tone will be generated in a short while, and you can also listen to it on the web page for comparison in real time.

The author uses the above K-Pop singer as an example to enhance it with Adobe Podcast Speech Enhancement. You can listen to whether the effect is satisfactory.