Who leads the WildASR benchmark?

Gemini 3 Pro currently leads WildASR with a score of 6.10 on Cer (lower is better).

What is the state-of-the-art score on WildASR?

The state-of-the-art result on WildASR is 6.10 (Cer), achieved by Gemini 3 Pro as of 2025.

How many models are tracked on WildASR?

Codesota tracks 7 models on WildASR across 2 metrics.

When was the WildASR leaderboard last updated?

The WildASR leaderboard on Codesota includes results through 2025.

Codesota · Benchmark · WildASRHome/Leaderboards/Audio & Speech/Automatic Speech Recognition/WildASR

Unknown

WildASR.

Name: WildASR Benchmark Results
Creator: Unknown
Published: 2025-01-01
License: https://creativecommons.org/licenses/by/4.0/

Multilingual (English, Chinese, Japanese, Korean) diagnostic benchmark evaluating ASR robustness across three out-of-distribution dimensions: environmental degradation (reverberation, noise, clipping), demographic shift (accents, children, older speakers), and linguistic diversity (code-switching, short utterances, incomplete speech). Uses WER for English and CER for CJK languages.

Paper ↗Leaderboard ↓Lineage

§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Cer

Cer is the reported evaluation metric for WildASR. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Lower is better

Trust tiers for Cerverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Gemini 3 Pro FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.	verified	6.10	2025	Source ↗	Looks wrong?
02	GPT-4o Transcribe FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.	verified	6.40	2025	Source ↗	Looks wrong?
03	Gemini 2.5 Pro FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.	verified	6.70	2025	Source ↗	Looks wrong?
04	Whisper Large V3 FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.	verified	7.50	2025	Source ↗	Looks wrong?
05	Scribe V1 FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.	verified	8.70	2025	Source ↗	Looks wrong?
06	Qwen2-Audio FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.	verified	9.10	2025	Source ↗	Looks wrong?
07	Nova 2 FLEURS clean ZH CER (%). Source: WildASR paper, Appendix G.	verified	10.1	2025	Source ↗	Looks wrong?

Wer

Wer is the reported evaluation metric for WildASR. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Lower is better

Trust tiers for Werverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Gemini 3 Pro FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.	verified	2.80	2025	Source ↗	Looks wrong?
02	GPT-4o Transcribe FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.	verified	2.80	2025	Source ↗	Looks wrong?
03	Gemini 2.5 Pro FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.	verified	3.60	2025	Source ↗	Looks wrong?
04	Scribe V1 FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.	verified	3.60	2025	Source ↗	Looks wrong?
05	Whisper Large V3 FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.	verified	4.20	2025	Source ↗	Looks wrong?
06	Qwen2-Audio FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.	verified	5.80	2025	Source ↗	Looks wrong?
07	Nova 2 FLEURS clean EN WER (%). Source: WildASR paper, Appendix G.	verified	6.00	2025	Source ↗	Looks wrong?

Lineage

WildASR in context.

See full speech recognition benchmarks lineage →

Predecessors (1)

active2022-05

FLEURS

FLEURS evaluates multilingual generalisation; WildASR evaluates naturalness — real ambient noise, spontaneous speech, code-switching, and domain diversity. The current attention path for foundation-model ASR evaluation.

This benchmark (1)

active2024-01

WildASR

None yet — this is the current frontier.

§ 04 · Submit a result

Add to the leaderboard.

← Back to Automatic Speech Recognition