Description

1Alignment training component of the LLaSO open-source framework, containing 12 million speech-text pairs

2Data sources include GigaSpeech (conversational speech), LibriSpeech (read narrative), LJ Speech (audiobooks), MLS (multilingual speech), VCTK (accented English)

3Covers multiple domains including conversation, narrative, audiobooks, and accented speech

4Unified into JSON-format ASR alignment tasks using 18 instruction templates

5Audio uniformly resampled to 16 kHz and converted to 128-channel mel spectrograms

Language Details

Language	Duration
English	None

Publisher

Eastern Institute of TechnologyLogic IntelligenceBeijing University of Posts and TelecommunicationsXiamen University

Resources

arXivhttps://arxiv.org/abs/2508.15418 GitHubhttps://github.com/EIT-NLP/LLaSO Hugging Facehttps://huggingface.co/datasets/YirongSun/LLaSO-Align