Unknown
audiocaps is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for audiocaps.
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | AudioLDM AudioLDM (Liu et al., ICML 2023). FAD on AudioCaps test set. Baseline comparison in AudioLDM 2 paper. | Community | 4.48 | 2023 | Source |
| 2 | AudioLDM 2-Full-Large AudioLDM 2-Full-Large (Liu et al., IEEE/ACM TASLP 2024). FAD on AudioCaps test set. Table II in paper. | Community | 1.86 | 2024 | Source |
| 3 | AudioLDM 2-Full AudioLDM 2-Full (Liu et al., IEEE/ACM TASLP 2024). FAD on AudioCaps test set. Table II in paper. | Community | 1.78 | 2024 | Source |
| 4 | TANGO TANGO (Ghosal et al., 2023). FAD on AudioCaps test set. Previous SOTA before AudioLDM 2. | Community | 1.73 | 2023 | Source |
| 5 | AudioLDM 2-AC-Large AudioLDM 2 AudioCaps-finetuned large model (Liu et al., IEEE/ACM TASLP 2024). Best FAD on AudioCaps test set. Table II in paper. | Community | 1.42 | 2024 | Source |