Name: Chinese-LiPS
Creator: Nankai University
Published: 2025-04

Description

1Contains approximately 100 hours of speech data, 36,208 segments from 207 speakers

2Visual modality includes both lip-reading video and speaker presentation slides

3Presentation slides designed by domain experts to ensure content quality and visual richness

4Speech recorded by professionals from various fields in China in quiet natural environments, all speakers use Mandarin

5Covers 9 topic domains: esports/gaming, automotive industry, travel exploration, sports, culture/history, science/technology, film/TV series, health/wellness, and others

6Near-balanced gender ratio of speakers at 1:1.13 (male:female)

7Speaker ages primarily distributed between 20-30 years, average segment duration of 10 seconds, maximum 30 seconds

8Split into 80% training, 15% test, and 5% validation sets with no speaker overlap across subsets

9All components carefully edited and manually aligned to ensure precision

Language Details

Language	Duration
Mandarin Chinese	100 hours

Publisher

Nankai University

Resources

arXivhttps://arxiv.org/abs/2504.15066 GitHubhttps://kiri0824.github.io/Chinese-LiPS/