Libri-Light
A 60,000-hour unlabeled English speech benchmark dataset for unsupervised and semi-supervised ASR research
Duration
60000 hours
Languages
1
Sample Rate
16 kHz
Published
2019-12
Description
160,000 hours of unlabeled English speech extracted from LibriVox audiobooks
2Over 7,000 unique speakers
3Provides three scales of unlabeled subsets: 60Kh, 6Kh, 600h
4Includes limited labeled training sets: 10h, 1h, 10min for semi-supervised learning
5Provides metadata including VAD, SNR, genre, and speaker ID
6All datasets, metrics, and baseline systems are open-sourced
Language Details
| Language | Duration |
|---|---|
| English | 60000 hours |
Publisher