15K validation images from Open Images with 166K human-written captions. Specifically tests zero-shot generalization to novel objects not seen during training.
Cider is the reported evaluation metric for NoCaps. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
Muted rows were not state of the art when published — an earlier or same-year result already scored better.
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | CogVLM-17B | verified | 128.3 | 2023 | Source ↗ | Looks wrong? |
| 02 | PaLI-X-55B | verified | 126.3 | 2023 | Source ↗ | Looks wrong? |
| 03 | PaLI-17B | verified | 124.4 | 2022 | Source ↗ | Looks wrong? |
| 04 | BLIP-2 (FlanT5XL) | verified | 123.7 | 2023 | Source ↗ | Looks wrong? |
| 05 | BLIP-2 (OPT 2.7B) | verified | 121.6 | 2023 | Source ↗ | Looks wrong? |
| 06 | BLIP ViT-L | unverified | 113.2 | 2022 | Paper ↗Code ↗ | Looks wrong? |