Common Voice
Mozilla's crowdsourced multilingual speech dataset covering 250+ languages with over 33,000 hours under CC0 public domain license
Duration
33150 hours
Languages
250
Sample Rate
48 kHz
Published
2019-02
Description
1Crowdsourced project initiated by Mozilla where volunteers record sample sentences and validate other users' recordings
2As of December 2025, over 33,150 hours of speech data with 22,108 hours community-validated
3Covers over 250 languages, with 8 languages exceeding 1,000 hours
4CC0 public domain license
5Continuously updated with periodic new version releases
Language Details
| Language | Duration |
|---|---|
| English | None |
| Catalan | None |
| Kinyarwanda | None |
| Belarusian | None |
| Esperanto | None |
Publisher