330K images with 5 captions each. Standard benchmark for image captioning.
2 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.
| # | Model | Org | Submitted | Paper / code | CIDEr |
|---|---|---|---|---|---|
| 01 | BLIP-2OSS | Salesforce | Jan 2023 | BLIP-2: Bootstrapping Language-Image Pre-training with F… | 145.80 |
| 02 | CoCaOSS | May 2022 | CoCa: Contrastive Captioners are Image-Text Foundation M… | 143.60 |
Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.