Name: GigaSpeech
Creator: SpeechColab (multi-institution collaboration)
Published: 2021-06
License: Apache-2.0

Description

110,000 hours of high-quality labeled audio for supervised training, 40,000 hours total for semi-supervised and unsupervised training

2Sourced from audiobooks, podcasts, and YouTube, covering both read and spontaneous speaking styles

3Proposes a novel forced alignment and segmentation pipeline to create sentence segments and filter low-quality transcriptions

4Provides 5 training subsets of different scales: 10h, 250h, 1000h, 2500h, and 10000h

Language Details

Language	Duration
English	10000 hours

Publisher

SpeechColab (multi-institution collaboration)

License & Commercial Use

Resources