2M+ human-labeled 10-second YouTube video clips covering 632 audio event classes.
Map is the reported evaluation metric for AudioSet. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Muted rows were not state of the art when published — an earlier or same-year result already scored better.
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | BEATs | verified | 0.51 | 2023 | Source ↗ | Looks wrong? |
| 02 | AST | verified | 0.48 | 2021 | Source ↗ | Looks wrong? |
| 03 | AST (Ensemble-M) | unverified | 0.48 | 2021 | Paper ↗Code ↗ | Looks wrong? |
| 04 | HTS-AT | verified | 0.47 | 2022 | Source ↗ | Looks wrong? |
| 05 | CLAP | verified | 0.43 | 2023 | Source ↗ | Looks wrong? |