Audio

Research on processing, understanding, and generating audio signals, including speech recognition, music generation, sound classification, and audio synthesis.

5 tasks52 datasets19 results

Tasks & Benchmarks

Show all datasets and SOTA results

Text-to-speech

LJ SpeechThe LJ Speech Dataset2017

4.61(mos)VALL-E 2

VCTKCSTR VCTK Corpus2019

4.36(mos)NaturalSpeech 3

Audio Classification

AudioSetAudioSet2017

0.48(map)AST (Ensemble-M)

ESC-50Environmental Sound Classification 502015

98.1(accuracy)BEATs (iter3+)

Voice cloning

LibriTTS test-clean (Zero-Shot TTS)LibriTTS test-clean zero-shot TTS evaluation2019

5.9(wer)VALL-E

Audio-Language Models

MMARMMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

OpenAudioBench - LlamaQuestionsOpenAudioBench

Automatic Speech Recognition

CoVost2 (en→zh)CoVoST 2 (CoVoST2)

Common Voice

CosyVoice3 Cross-Lingual Test Set zh to enCosyVoice3 Cross-Lingual Test Set (zh→en)

MiniMax Multilingual Test Set - ChineseMiniMax TTS Multilingual Test Set

Open ASR LeaderboardOpen Automatic Speech Recognition Leaderboard

SEED Seed-TTS test-zhseed-tts-eval (Seed-TTS evaluation test set) — test-zh

SPGISpeech

Switchboard

Tedlium

VoiceBench OverallVoiceBench: Benchmarking LLM-Based Voice Assistants

VoxPopuli

VoxPopuli En

WSJ

Get notified when these results update

New models drop weekly. We track them so you don't have to.