Codesota · Benchmark · AudioSetHome/Leaderboards/Audio & Speech/Audio Classification/AudioSet
Unknown

AudioSet.

2M+ human-labeled 10-second YouTube video clips covering 632 audio event classes.

Paper Leaderboard Lineage
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

map

Map is the reported evaluation metric for AudioSet. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for mapverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

RankModelTrustScoreYearLinksFix
01BEATs
BEATs iterative self-labeling (Chen et al., Microsoft, ICML 2023). mAP 50.6% on AudioSet eval set. From abstract: "new SOTA mAP 50.6% on AudioSet-2M".
verified0.512023Source ↗Looks wrong?
02AST
AST (Audio Spectrogram Transformer, Gong et al., MIT, INTERSPEECH 2021). mAP 0.485 on AudioSet eval set. From abstract.
verified0.482021Source ↗Looks wrong?
03AST (Ensemble-M)unverified0.482021Paper ↗Code ↗Looks wrong?
04HTS-AT
HTS-AT (Chen et al., ICASSP 2022). mAP 0.471 on AudioSet eval set. Outperformed AST (0.459→0.485 in AST paper, HTS-AT reports 0.471 outperforming prior SOTA).
verified0.472022Source ↗Looks wrong?
05CLAP
CLAP (Wu et al., ICASSP 2023). mAP 0.428 on AudioSet eval set.
verified0.432023Source ↗Looks wrong?
Lineage

AudioSet in context.

See full audio understanding benchmarks lineage →
This benchmark (1)
saturating2017-03
AudioSet
Successors (1)
active2020-01
Clotho
Clotho shifted the evaluation task from classification (what sounds are here?) to captioning (describe these sounds in a sentence). A scope shift enabled by the growing capability of audio encoders trained on AudioSet.
§ 04 · Submit a result

Add to the leaderboard.

← Back to Audio Classification