AISHELL-1
An open-source Mandarin speech recognition dataset with 178 hours of high-quality read speech from 400 speakers
Duration
178 hours
Languages
1
Sample Rate
16 kHz
Published
2017-09
Description
1400 speakers from various accent regions across China
2Training set: 120,098 utterances (340 speakers), validation set: 14,326 utterances (40 speakers), test set: 7,176 utterances (20 speakers)
3Recorded with high-fidelity microphones in quiet indoor environments, downsampled to 16 kHz
4Manual transcription accuracy above 95%
5Apache 2.0 open-source license
Language Details
| Language | Duration |
|---|---|
| Mandarin Chinese | 178 hours |
Publisher