WenetSpeech-Chuan
First 10,000-hour-scale open-source Sichuanese speech dataset with rich annotations across multiple domains
Duration
10013 hours
Languages
1
Sample Rate
None
Published
2025-09
Description
1Total duration exceeding 10,013 hours, with 3,714 hours of strong labels (confidence 0.9-1.0) and 6,299 hours of weak labels (confidence 0.6-0.9)
2Data sourced from 9 domains: short videos (52.83%), entertainment (20.08%), livestreaming (18.35%), documentaries (5.36%), audiobooks (1.14%), interviews (0.89%), news (0.83%), read speech (0.48%), drama (0.05%)
3Rich annotation dimensions including transcription text, domain labels, speaker gender, age, emotion, and other paralinguistic information
4Built using the Chuan-Pipeline processing framework, incorporating VAD segmentation, single-speaker clustering, LLM-GER transcription error correction, and multimodal punctuation prediction
5Audio quality distribution concentrated in the WV-MOS 2.5-4.0 range, balancing clean recordings and real-world acoustic conditions
6Currently the largest open-source Sichuanese dialect speech dataset
Language Details
| Language | Duration |
|---|---|
| Sichuanese | 10013 hours |
Publisher