Codesota · Benchmark · NoCapsHome/Leaderboards/Multimodal Media/Image Captioning/NoCaps
Unknown

NoCaps.

15K validation images from Open Images with 166K human-written captions. Specifically tests zero-shot generalization to novel objects not seen during training.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

cider

Cider is the reported evaluation metric for NoCaps. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for ciderverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

RankModelTrustScoreYearLinksFix
01CogVLM-17B
CogVLM-17B fine-tuned. NoCaps overall CIDEr. NeurIPS 2024. Tsinghua/Zhipu.
verified128.32023Source ↗Looks wrong?
02PaLI-X-55B
PaLI-X 55B fine-tuned. NoCaps overall CIDEr. 2023. Google Research.
verified126.32023Source ↗Looks wrong?
03PaLI-17B
PaLI-17B fine-tuned. NoCaps overall CIDEr. ICLR 2023. Google Research.
verified124.42022Source ↗Looks wrong?
04BLIP-2 (FlanT5XL)
BLIP-2 with ViT-g + FlanT5XL. Zero-shot NoCaps val CIDEr (overall). ICML 2023. Salesforce.
verified123.72023Source ↗Looks wrong?
05BLIP-2 (OPT 2.7B)
BLIP-2 with ViT-g + OPT 2.7B. Zero-shot NoCaps val CIDEr (overall). ICML 2023. Salesforce.
verified121.62023Source ↗Looks wrong?
06BLIP ViT-Lunverified113.22022Paper ↗Code ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Image Captioning