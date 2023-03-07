Google announced earlier the results of its general language model research invested in November last year, which added up to 12 million hours of speech content length and 28 billion sets of training parameters, corresponding to more than 300 languages ​​at the same time, and currently supports more than 100 language recognition , the future goal is to support more than 1000 languages.

According to Google’s instructions, its general language model adopts continuous self-supervised learning and continuous fine-tuning. The BEST-RQ algorithm continuously analyzes and learns language structure without external supervision, and automatically completes 80% of the learning volume.

In addition, the pre-training model is pre-trained through multi-objective supervision, including text injection, BEST-RQ, and supervised loss functions, etc., so as to integrate other data training results, so that the training model can understand the content and semantics described in the language. , and also fine-tune the final output result through the supervised loss function.

In such training results, Google said that without final fine-tuning through the supervised loss function, it has actually been able to obtain quite good semantic understanding and statement performance. It has been applied to YouTube’s language translation function and has been able to translate results in 73 languages. The word error rate (WER, Word Error Rate) achieves a performance lower than 30%.

In the comprehension performance of American English, Google also explained that the word error rate of its general language model is lower than that of other advanced language models, and the correct rate is even increased by 6%. Compared with the 18 languages ​​corresponding to the large language model Whisper proposed by OpenAI , its word error rate is 32.7% on average, while Whisper’s word error rate is below 40%.

In other parts, Google emphasizes that in CORAAL, which corresponds to African-Americans using spoken English, SpeechStew, which mixes different accents, and FLEURS test performance, which corresponds to 102 languages, the accuracy of speech recognition results is higher than Whisper. In terms of automatic semantic translation performance, Google emphasizes that its general language model has a better score in BLEU than Whisper.

Google currently has research papers related to the general language model, and also provides this general language model API for researchers as more derivative research applications.

In the previous external explanation, Google believes that once the gap in language understanding is resolved, it will help promote more application development opportunities, and at the same time promote more services to attract everyone to use.

