330K images with 5 captions each. Standard benchmark for image captioning.
Cider is the reported evaluation metric for COCO Captions. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | PaLI-X-55B | verified | 149.2 | 2023 | Source ↗ | Looks wrong? |
| 02 | PaLI-17B | verified | 149.1 | 2022 | Source ↗ | Looks wrong? |
| 03 | BEiT-3 | verified | 147.6 | 2022 | Source ↗ | Looks wrong? |
| 04 | BLIP-2 (OPT 2.7B) | verified | 145.8 | 2023 | Source ↗ | Looks wrong? |
| 05 | OFA | verified | 145.3 | 2022 | Source ↗ | Looks wrong? |
| 06 | GIT2 | verified | 145 | 2022 | Source ↗ | Looks wrong? |
| 07 | GIT | verified | 144.8 | 2022 | Source ↗ | Looks wrong? |
| 08 | SimVLM | verified | 143.3 | 2022 | Source ↗ | Looks wrong? |
| 09 | VinVL | verified | 140.9 | 2022 | Source ↗ | Looks wrong? |
| 10 | Chameleon-SFT | unverified | 140.8 | 2024 | Paper ↗Code ↗ | Looks wrong? |
| 11 | BLIP | verified | 136.7 | 2022 | Source ↗ | Looks wrong? |
| 12 | CogVLM | verified | 126.4 | 2023 | Source ↗ | Looks wrong? |
CIDEr is the reported evaluation metric for COCO Captions. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | BLIP-2 | verified | 145.8 | 2023 | Paper ↗ | Looks wrong? |
| 02 | CoCa | verified | 143.6 | 2022 | Paper ↗ | Looks wrong? |
R 1 is the reported evaluation metric for COCO Captions. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | BLIP ViT-L | unverified | 65.1 | 2022 | Paper ↗Code ↗ | Looks wrong? |
| 02 | ALIGN | unverified | 59.9 | 2021 | Paper ↗Code ↗ | Looks wrong? |
| 03 | AltCLIP | unverified | 42.9 | 2022 | Paper ↗Code ↗ | Looks wrong? |
Bleu 4 is the reported evaluation metric for COCO Captions. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | GIT | verified | 44.1 | 2022 | Source ↗ | Looks wrong? |
| 02 | GIT2 | verified | 44.1 | 2022 | Source ↗ | Looks wrong? |
| 03 | OFA | verified | 43.9 | 2022 | Source ↗ | Looks wrong? |
| 04 | BLIP-2 (OPT 2.7B) | verified | 43.7 | 2023 | Source ↗ | Looks wrong? |
| 05 | VinVL | verified | 41 | 2022 | Source ↗ | Looks wrong? |
| 06 | CoCa | verified | 40.9 | 2022 | Source ↗ | Looks wrong? |
| 07 | SimVLM | verified | 40.6 | 2022 | Source ↗ | Looks wrong? |
| 08 | BLIP | verified | 40.4 | 2022 | Source ↗ | Looks wrong? |
Spice is the reported evaluation metric for COCO Captions. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | SimVLM | verified | 25.4 | 2022 | Source ↗ | Looks wrong? |
| 02 | OFA | verified | 24.8 | 2022 | Source ↗ | Looks wrong? |
| 03 | CoCa | verified | 24.7 | 2022 | Source ↗ | Looks wrong? |