THCHS-30
A free 30-hour Chinese speech corpus released by Tsinghua University, a classic entry-level dataset for Mandarin ASR research
Duration
30 hours
Languages
1
Sample Rate
16 kHz
Published
2015-12
Description
1Accent-free Mandarin speech recorded by 50 speakers in a quiet environment
2Training set: 10,000 utterances (30 speakers), validation set: ~900 utterances, test set: 2,495 utterances (10 speakers)
3Includes language model, pronunciation dictionary, and Kaldi-based baseline system
4Completely free for academic use
5Recorded in 2000-2001, publicly released in 2015
Language Details
| Language | Duration |
|---|---|
| Mandarin Chinese | 30 hours |
Publisher