Codesota · Benchmark · COCO CaptionsHome/Leaderboards/Multimodal Media/Image Captioning/COCO Captions
Unknown

COCO Captions.

330K images with 5 captions each. Standard benchmark for image captioning.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

cider

Cider is the reported evaluation metric for COCO Captions. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for ciderverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01PaLI-X-55B
PaLI-X 55B (scaling up multilingual vision-language). Google, 2023. CIDEr on Karpathy test split.
verified149.22023Source ↗Looks wrong?
02PaLI-17B
PaLI (Pathways Language and Image model) 17B. Google Research, ICLR 2023. CIDEr on Karpathy test split without CIDEr optimization.
verified149.12022Source ↗Looks wrong?
03BEiT-3
BEiT-3 (Image as a Foreign Language). Microsoft, CVPR 2023. CIDEr on Karpathy test split.
verified147.62022Source ↗Looks wrong?
04BLIP-2 (OPT 2.7B)
BLIP-2 with frozen OPT-2.7B. Salesforce, ICML 2023. CIDEr on Karpathy test split.
verified145.82023Source ↗Looks wrong?
05OFA
OFA-Huge (Unifying Architectures, Tasks, and Modalities). Alibaba DAMO, ICML 2022. CIDEr on Karpathy test split.
verified145.32022Source ↗Looks wrong?
06GIT2
GIT2 (5.1B parameters). Microsoft, 2022. CIDEr on Karpathy test split.
verified1452022Source ↗Looks wrong?
07GIT
GIT (Generative Image-to-text Transformer). Microsoft, 2022. CIDEr on Karpathy test split.
verified144.82022Source ↗Looks wrong?
08SimVLM
SimVLM large. ICLR 2022. CIDEr on Karpathy test split.
verified143.32022Source ↗Looks wrong?
09VinVL
VinVL large model. CVPR 2021. CIDEr on Karpathy test split.
verified140.92022Source ↗Looks wrong?
10Chameleon-SFTunverified140.82024Paper ↗Code ↗Looks wrong?
11BLIP
BLIP (Bootstrapping Language-Image Pre-training). ICML 2022. CIDEr on Karpathy test split.
verified136.72022Source ↗Looks wrong?
12CogVLM
CogVLM-17B zero-shot. Tsinghua KEG, Nov 2023. CIDEr on COCO Karpathy test split. Zero-shot result.
verified126.42023Source ↗Looks wrong?

CIDEr

CIDEr is the reported evaluation metric for COCO Captions. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for CIDErverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01BLIP-2
COCO Karpathy test split. FlanT5-XXL backbone. Table 12. arxiv:2301.12597
verified145.82023Paper ↗Looks wrong?
02CoCa
COCO Karpathy test split. Single-model fine-tune. Table 4. arxiv:2205.01068
verified143.62022Paper ↗Looks wrong?

R 1

R 1 is the reported evaluation metric for COCO Captions. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for R 1verifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01BLIP ViT-Lunverified65.12022Paper ↗Code ↗Looks wrong?
02ALIGNunverified59.92021Paper ↗Code ↗Looks wrong?
03AltCLIPunverified42.92022Paper ↗Code ↗Looks wrong?

bleu-4

Bleu 4 is the reported evaluation metric for COCO Captions. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for bleu-4verifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01GIT
GIT. BLEU-4 on Karpathy test split.
verified44.12022Source ↗Looks wrong?
02GIT2
GIT2. BLEU-4 on Karpathy test split.
verified44.12022Source ↗Looks wrong?
03OFA
OFA-Huge. BLEU-4 on Karpathy test split.
verified43.92022Source ↗Looks wrong?
04BLIP-2 (OPT 2.7B)
BLIP-2 with frozen OPT-2.7B. BLEU-4 on Karpathy test split.
verified43.72023Source ↗Looks wrong?
05VinVL
VinVL large model. CVPR 2021. BLEU-4 on Karpathy test split.
verified412022Source ↗Looks wrong?
06CoCa
CoCa. BLEU-4 on Karpathy test split.
verified40.92022Source ↗Looks wrong?
07SimVLM
SimVLM large. ICLR 2022. BLEU-4 on Karpathy test split.
verified40.62022Source ↗Looks wrong?
08BLIP
BLIP. ICML 2022. BLEU-4 on Karpathy test split.
verified40.42022Source ↗Looks wrong?

spice

Spice is the reported evaluation metric for COCO Captions. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for spiceverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01SimVLM
SimVLM large. ICLR 2022. SPICE on Karpathy test split.
verified25.42022Source ↗Looks wrong?
02OFA
OFA-Huge. SPICE on Karpathy test split.
verified24.82022Source ↗Looks wrong?
03CoCa
CoCa. SPICE on Karpathy test split.
verified24.72022Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Image Captioning