Speech

Speech Enhancement

Recovering clean speech from noisy recordings. Benchmarked on VoiceBank+DEMAND (PESQ, STOI, SI-SDR) and the Microsoft DNS Challenge (DNSMOS).

0 datasets0 resultsView full task mapping →

Speech Enhancement is a key task in speech. Below you will find the standard benchmarks used to evaluate models, along with current state-of-the-art results.

Benchmarks & SOTA

No datasets indexed for this task yet.

Contribute on GitHub

Related Tasks

Speaker Verification

Verifying speaker identity from voice samples.

Speech Translation

Translating spoken audio directly to another language.

Automatic speech recognition went from a specialized pipeline (acoustic model + language model + decoder) to a single end-to-end model with OpenAI's Whisper (2022), which was trained on 680K hours of web audio and became the de facto open-source standard overnight. Whisper large-v3 hits under 5% word error rate on LibriSpeech clean, and commercial APIs from Google, AWS, and Deepgram compete fiercely on noisy, accented, and multilingual speech where error rates are 2-3x higher. The real frontier is real-time streaming ASR at conversational latency (<500ms), code-switching between languages mid-sentence, and robust recognition of domain-specific terminology (medical, legal, technical). Assembly AI's Universal-2 and Deepgram's Nova-3 currently lead production benchmarks, but the gap with fine-tuned Whisper variants is narrow.

Get notified when these results update

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep Speech Enhancement benchmarks accurate. Report outdated results, missing benchmarks, or errors.

Back to Speech

Benchmarks & SOTA

Related Tasks

Speaker Verification

Speech Translation

Speech Recognition

Something wrong or missing?