Codesota · Benchmark · NoCapsHome/Leaderboards/Multimodal Media/Image Captioning/NoCaps
Unknown

NoCaps.

15K validation images from Open Images with 166K human-written captions. Specifically tests zero-shot generalization to novel objects not seen during training.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

cider

Cider is the reported evaluation metric for NoCaps. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for ciderverifiedpapervendorcommunityunverified
RankModelTrustScoreYearSource
01CogVLM-17B
CogVLM-17B fine-tuned. NoCaps overall CIDEr. NeurIPS 2024. Tsinghua/Zhipu.
verified128.32023Source ↗
02PaLI-X-55B
PaLI-X 55B fine-tuned. NoCaps overall CIDEr. 2023. Google Research.
verified126.32023Source ↗
03PaLI-17B
PaLI-17B fine-tuned. NoCaps overall CIDEr. ICLR 2023. Google Research.
verified124.42022Source ↗
04BLIP-2 (FlanT5XL)
BLIP-2 with ViT-g + FlanT5XL. Zero-shot NoCaps val CIDEr (overall). ICML 2023. Salesforce.
verified123.72023Source ↗
05BLIP-2 (OPT 2.7B)
BLIP-2 with ViT-g + OPT 2.7B. Zero-shot NoCaps val CIDEr (overall). ICML 2023. Salesforce.
verified121.62023Source ↗
§ 04 · Submit a result

Add to the leaderboard.

← Back to Image Captioning
NoCaps Leaderboard | CodeSOTA | CodeSOTA