Description

1Instruction tuning component of the LLaSO open-source framework, containing 13.5 million multi-task instruction samples

2Covers 20 tasks distributed as: linguistic tasks 52%, semantic tasks 8%, paralinguistic tasks 40%

3Supports three interaction modes: text instruction + audio input, audio instruction + text input, audio-only

4Audio composition: 71% real-world audio, 29% synthesized speech

5Data sources include GigaSpeech, LibriSpeech, VoxCeleb1, Common Voice, MELD, CREMA-D, and other corpora

Language Details

Language	Duration
English	None

Publisher

Eastern Institute of TechnologyLogic IntelligenceBeijing University of Posts and TelecommunicationsXiamen University

License & Commercial Use

Resources