HiFiTTS-2
Large-scale high-bandwidth English TTS dataset by NVIDIA, 36,700 hours of speech from 5,013 speakers
Duration
36700 hours
Languages
1
Sample Rate
44.1 kHz
Published
2025-06
Description
1Contains two subsets: 22.05 kHz subset (~36,700 hours, 5,013 speakers) and 44.1 kHz subset (~31,700 hours, 4,631 speakers)
2Sourced from LibriVox audiobook project, original audio at 48 kHz, downsampled to 44.1 kHz FLAC format
3Provides detailed metadata annotations supporting zero-shot TTS training
4Focused on high-bandwidth speech synthesis research
5Public domain license, suitable for commercial use (speakers who opted out of ML use have been removed)
Language Details
| Language | Duration |
|---|---|
| English | 36700 hours |
Publisher