Who leads the ESC-50 benchmark?

BEATs (iter3+) currently leads ESC-50 with a score of 98.1 on accuracy.

What is the state-of-the-art score on ESC-50?

The state-of-the-art result on ESC-50 is 98.1 (accuracy), achieved by BEATs (iter3+) as of 2023.

How many models are tracked on ESC-50?

Codesota tracks 8 models on ESC-50.

When was the ESC-50 leaderboard last updated?

The ESC-50 leaderboard on Codesota includes results through 2023, with the earliest tracked result from 2021.

Codesota · Benchmark · ESC-50Home/Leaderboards/Audio & Speech/Audio Classification/ESC-50

Unknown

ESC-50.

Name: ESC-50 Benchmark Results
Creator: Unknown
Published: 2021-01-01
License: https://creativecommons.org/licenses/by/4.0/

2,000 environmental audio recordings organized into 50 classes (animals, natural soundscapes, etc.).

Paper ↗Leaderboard ↓Lineage

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

accuracy

Accuracy is the reported evaluation metric for ESC-50. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	BEATs (iter3+)	unverified	98.1	2022	Paper ↗Code ↗	Looks wrong?
02	BEATs BEATs iterative self-labeling (Chen et al., Microsoft, ICML 2023). 98.1% 5-fold CV on ESC-50. Stated in abstract.	verified	98.1	2023	Source ↗	Looks wrong?
03	HTS-AT HTS-AT (Chen et al., ICASSP 2022). 97.0 ± 0.2% 5-fold CV on ESC-50. Verified from GitHub repo and search results.	verified	97	2022	Source ↗	Looks wrong?
04	AST-P	unverified	95.6	2021	Paper ↗Code ↗	Looks wrong?
05	AST AST (Audio Spectrogram Transformer, Gong et al., MIT, INTERSPEECH 2021). 95.6% 5-fold CV on ESC-50. From abstract.	verified	95.6	2021	Source ↗	Looks wrong?
06	CLAP CLAP (Contrastive Language-Audio Pretraining, Wu et al., ICASSP 2023). Zero-shot classification accuracy on ESC-50.	verified	93.7	2023	Source ↗	Looks wrong?
07	CLAP+K2C Aug.	unverified	91	2022	Paper ↗Code ↗	Looks wrong?
08	AST-S	unverified	88.7	2021	Paper ↗Code ↗	Looks wrong?

Lineage

ESC-50 in context.

See full audio understanding benchmarks lineage →

None — this is where the lineage begins.

This benchmark (1)

saturated2015-01

ESC-50

Successors (2)

saturating2017-03

AudioSet

AudioSet replaced ESC-50 as the primary audio classification benchmark — 527 classes vs 50, 2M clips vs 2K, hierarchical ontology. Scale and coverage made it the ImageNet analogue for audio. ESC-50 became a probe task for pretrained representations.

active2017-10

MUSDB18

MUSDB18 branches into music source separation — a generative audio task, not a classification one. Different task family entirely; ESC-50's sound-class framework doesn't apply.

§ 04 · Submit a result

Add to the leaderboard.

← Back to Audio Classification