ESC-50
Unknown
2,000 environmental audio recordings organized into 50 classes (animals, natural soundscapes, etc.).
Benchmark Stats
Models4
Papers4
Metrics1
SOTA History
Only 4 models on this benchmark
Help build the community leaderboard — submit your model results.
accuracy
accuracy
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | BEATs BEATs iterative self-labeling (Chen et al., Microsoft, ICML 2023). 98.1% 5-fold CV on ESC-50. Stated in abstract. | Community | 98.1 | 2023 | Source |
| 2 | HTS-AT HTS-AT (Chen et al., ICASSP 2022). 97.0 ± 0.2% 5-fold CV on ESC-50. Verified from GitHub repo and search results. | Community | 97 | 2022 | Source |
| 3 | AST AST (Audio Spectrogram Transformer, Gong et al., MIT, INTERSPEECH 2021). 95.6% 5-fold CV on ESC-50. From abstract. | Community | 95.6 | 2021 | Source |
| 4 | CLAP CLAP (Contrastive Language-Audio Pretraining, Wu et al., ICASSP 2023). Zero-shot classification accuracy on ESC-50. | Community | 93.7 | 2023 | Source |