WenetSpeech-Wu
First large-scale multi-dimensionally annotated open-source Wu Chinese speech dataset, covering ~8,000 hours and 8 Wu sub-dialects
Duration
8000 hours
Languages
1
Sample Rate
16 kHz
Published
2026-01
Description
1Contains approximately 8,000 hours of Wu Chinese speech data, 3.86 million speech segments with an average duration of 7.45 seconds
2Covers 8 Wu sub-dialects: Shanghainese, Suzhounese, Shaoxingnese, Ningbonese, Hangzhounese, Jiaxingnese, Taizhouese, and Wenzhounese
3Spans 11 domains: news, culture, vlog, entertainment, education, podcast, commentary, interviews, radio drama, music programs, and audiobooks
4Multi-dimensional annotations: transcription text (with confidence scores), Wu-to-Mandarin translation, domain and sub-dialect labels, speaker attributes (gender, age), emotion labels, and audio quality metrics
5Quality filtering using DNSMOS and SNR, with high-quality transcriptions generated through multi-ASR system ROVER fusion
6Tiered data quality strategy designed for different tasks, supporting ASR, TTS, speech translation, emotion recognition, and instructed TTS
7Sourced from in-the-wild Wu Chinese speech; approximately 37% of recordings have unidentifiable specific sub-dialects
Language Details
| Language | Duration |
|---|---|
| Wu Chinese | 8000 hours |
Publisher
Resources
GitHubhttps://github.com/ASLP-lab/WenetSpeech-Wu-RepoarXivhttps://arxiv.org/abs/2601.11027Hugging Facehttps://huggingface.co/ASLP-lab/WenetSpeech-Wu-Speech-GenerationHugging Facehttps://huggingface.co/ASLP-lab/WenetSpeech-Wu-Speech-UnderstandingHugging Facehttps://huggingface.co/collections/ASLP-lab/wenetspeech-wuHugging Facehttps://huggingface.co/datasets/ASLP-lab/WenetSpeech-WuHugging Facehttps://huggingface.co/datasets/ASLP-lab/WenetSpeech-Wu-BenchGitHubhttps://hujingbin1.github.io/WenetSpeechWu-Demo-Page-Public/youtu.behttps://youtu.be/h293y859QSw