Classification of audio signals into predefined categories such as music genres, environmental sounds, or speaker identification.
2M+ human-labeled 10-second YouTube video clips covering 632 audio event classes.
Leading models on AudioSet.
| # | Model | map | Year | Source |
|---|---|---|---|---|
| ★ | AST (Ensemble-M) | 0.485 | 2021 | paper ↗ |
Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.
6 datasets tracked for this task.
Still looking for something on Audio Classification? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.
Real humans read every message. We track what people are asking for and prioritize accordingly.