Multilingual LibriSpeech (MLS)
A large-scale multilingual audiobook speech dataset covering 8 languages with approximately 50,000 hours, including 44,500 hours of English
Duration
50500 hours
Languages
8
Sample Rate
16 kHz
Published
2020-12
Description
1Sourced from the LibriVox audiobook project
2Covers 8 languages: English, German, Dutch, Spanish, French, Italian, Portuguese, Polish
3Approximately 44,500 hours of English, ~6,000 hours total for other languages
4Provides language models and baseline ASR models for all languages
Language Details
| Language | Duration |
|---|---|
| English | 44500 hours |
| German | None |
| Dutch | None |
| Spanish | None |
| French | None |
| Italian | None |
| Portuguese | None |
| Polish | None |
Publisher