ReazonSpeech
Large-scale Japanese ASR dataset with over 35,000 hours collected from Japanese terrestrial TV broadcasts
Duration
35000 hours
Languages
1
Sample Rate
16 kHz
Published
2023-01
Description
1Natural Japanese speech collected from terrestrial TV broadcasts in Japan
2v2.0 exceeds 35,000 hours; v1.0 contains 19,000 hours
3Provides 5 subsets of different scales
4To comply with copyright law, data is randomly shuffled at utterance level to prevent reconstruction of original TV programs
5Restricted to uses permitted under Article 30-4 of the Japanese Copyright Act
Language Details
| Language | Duration |
|---|---|
| Japanese | 35000 hours |
Publisher