Description

1Collected from YouTube and Podcasts, covering multiple domains and speaking styles

210,000+ hours of strongly labeled data (confidence >= 0.95), 2,400+ hours of weakly labeled data, ~10,000 hours of unlabeled data, totaling 22,400+ hours

3Automatic labeling using OCR and ASR technologies, with quality filtering via end-to-end label error detection

4Provides S, M, L training subsets for building ASR systems at different data scales

5Strongly labeled data categorized into 10 domain groups

Language Details

Language	Duration
Mandarin Chinese	22400 hours

Publisher

MobvoiNorthwestern Polytechnical University

License & Commercial Use

Resources

Paperhttps://arxiv.org/abs/2110.03370 Hugging Facehttps://huggingface.co/datasets/wenet-e2e/wenetspeech OpenSLRhttps://www.openslr.org/121/Sample / Demohttps://wenet-e2e.github.io/WenetSpeech/