Codesota · Benchmark · ESC-50Home/Leaderboards/Audio & Speech/Audio Classification/ESC-50
Unknown

ESC-50.

2,000 environmental audio recordings organized into 50 classes (animals, natural soundscapes, etc.).

Paper Leaderboard Lineage
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

accuracy

Accuracy is the reported evaluation metric for ESC-50. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01BEATs (iter3+)unverified98.12022Paper ↗Code ↗Looks wrong?
02BEATs
BEATs iterative self-labeling (Chen et al., Microsoft, ICML 2023). 98.1% 5-fold CV on ESC-50. Stated in abstract.
verified98.12023Source ↗Looks wrong?
03HTS-AT
HTS-AT (Chen et al., ICASSP 2022). 97.0 ± 0.2% 5-fold CV on ESC-50. Verified from GitHub repo and search results.
verified972022Source ↗Looks wrong?
04AST-Punverified95.62021Paper ↗Code ↗Looks wrong?
05AST
AST (Audio Spectrogram Transformer, Gong et al., MIT, INTERSPEECH 2021). 95.6% 5-fold CV on ESC-50. From abstract.
verified95.62021Source ↗Looks wrong?
06CLAP
CLAP (Contrastive Language-Audio Pretraining, Wu et al., ICASSP 2023). Zero-shot classification accuracy on ESC-50.
verified93.72023Source ↗Looks wrong?
07CLAP+K2C Aug.unverified912022Paper ↗Code ↗Looks wrong?
08AST-Sunverified88.72021Paper ↗Code ↗Looks wrong?
Lineage

ESC-50 in context.

See full audio understanding benchmarks lineage →
None — this is where the lineage begins.
This benchmark (1)
saturated2015-01
ESC-50
Successors (2)
saturating2017-03
AudioSet
AudioSet replaced ESC-50 as the primary audio classification benchmark — 527 classes vs 50, 2M clips vs 2K, hierarchical ontology. Scale and coverage made it the ImageNet analogue for audio. ESC-50 became a probe task for pretrained representations.
active2017-10
MUSDB18
MUSDB18 branches into music source separation — a generative audio task, not a classification one. Different task family entirely; ESC-50's sound-class framework doesn't apply.
§ 04 · Submit a result

Add to the leaderboard.

← Back to Audio Classification
ESC-50 Leaderboard | CodeSOTA | CodeSOTA