MMSU is a dataset comprising 5,000 audio-question-answer triplets across 47 distinct tasks. It incorporates a wide range of linguistic phenomena, including phonetics, prosody, syntax, syntactics, semantics, and paralinguistics, and is a comprehensive multi-task spoken language understanding and reasoning benchmark.
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.