R 10 is the reported evaluation metric for AudioCaps. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | CLAP (HTSAT-RoBERTa, fusion, K2C Aug.; T->A) | unverified | 83.7 | 2022 | Paper ↗Code ↗ | Looks wrong? |
R 5 is the reported evaluation metric for AudioCaps. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | CLAP (HTSAT-RoBERTa, fusion, K2C Aug.; T->A) | unverified | 71.9 | 2022 | Paper ↗Code ↗ | Looks wrong? |
R 1 is the reported evaluation metric for AudioCaps. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | CLAP (HTSAT-RoBERTa, fusion, K2C Aug.; T->A) | unverified | 35.1 | 2022 | Paper ↗Code ↗ | Looks wrong? |
Fad is the reported evaluation metric for AudioCaps. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | AudioLDM | verified | 4.48 | 2023 | Source ↗ | Looks wrong? |
| 02 | AudioLDM 2-Full-Large | verified | 1.86 | 2024 | Source ↗ | Looks wrong? |
| 03 | AudioLDM 2-Full | verified | 1.78 | 2024 | Source ↗ | Looks wrong? |
| 04 | TANGO | verified | 1.73 | 2023 | Source ↗ | Looks wrong? |
| 05 | AudioLDM 2-AC-Large | verified | 1.42 | 2024 | Source ↗ | Looks wrong? |
Cider is the reported evaluation metric for AudioCaps. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | Audio Flamingo 3 | unverified | 0.70 | 2025 | Paper ↗Code ↗ | Looks wrong? |
Spider is the reported evaluation metric for AudioCaps. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | AudioCaps baseline (TopDown+Align) | paper | 0.37 | 2026 | Source ↗ | Looks wrong? |
| 02 | EnCLAP-base | paper | 0.30 | 2026 | Source ↗ | Looks wrong? |
| 03 | Pengi | paper | 0.27 | 2026 | Source ↗ | Looks wrong? |