AI Technology: Audio (speech recognition, speech synthesis, etc.)
The abundance of openly available audio data in English enables pretraining of automatic speech recognition/speech-to-text models on hundreds of thousands to millions of hours of recorded speech.
As a result speech recognition systems are approaching human level robustness in English. Other languages' performance in multilingual speech recognition tend to stand in proportion to the amount of data included from the language - or the language family - in question.
For low to mid resource languages with fewer speakers, the amount of openly available data may be limited, and as a consequence these languages tend to be underrepresented in large scale efforts to train multilingual speech recognition systems.
Cultural heritage institutions such as the National Library of Sweden hosts large collections of audio recordings. These resources can potentially bridge some of the existing speech recognition performance gaps between Swedish and higher resource languages.
The research team believes Swedish speech recognition can be further improved upon by scaling up the amount of training data.
At KBLab at the National Library of Sweden, the team has constructed an inclusive and transparent speech corpus with emphasis on all variations of spoken Swedish that the project will use to train speech recognition models for Swedish.
Leonora Vesterbacka Olsson, National Library of Sweden - Sweden