Who leads the NoCaps benchmark?

CogVLM-17B currently leads NoCaps with a score of 128.3 on cider.

What is the state-of-the-art score on NoCaps?

The state-of-the-art result on NoCaps is 128.3 (cider), achieved by CogVLM-17B as of 2023.

How many models are tracked on NoCaps?

Codesota tracks 6 models on NoCaps.

When was the NoCaps leaderboard last updated?

The NoCaps leaderboard on Codesota includes results through 2023, with the earliest tracked result from 2022.

Codesota · Benchmark · NoCapsHome/Leaderboards/Multimodal Media/Image Captioning/NoCaps

Unknown

NoCaps.

Name: NoCaps Benchmark Results
Creator: Unknown
Published: 2022-01-01
License: https://creativecommons.org/licenses/by/4.0/

15K validation images from Open Images with 166K human-written captions. Specifically tests zero-shot generalization to novel objects not seen during training.

Paper ↗Leaderboard ↓

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

cider

Cider is the reported evaluation metric for NoCaps. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for ciderverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	CogVLM-17B CogVLM-17B fine-tuned. NoCaps overall CIDEr. NeurIPS 2024. Tsinghua/Zhipu.	verified	128.3	2023	Source ↗	Looks wrong?
02	PaLI-X-55B PaLI-X 55B fine-tuned. NoCaps overall CIDEr. 2023. Google Research.	verified	126.3	2023	Source ↗	Looks wrong?
03	PaLI-17B PaLI-17B fine-tuned. NoCaps overall CIDEr. ICLR 2023. Google Research.	verified	124.4	2022	Source ↗	Looks wrong?
04	BLIP-2 (FlanT5XL) BLIP-2 with ViT-g + FlanT5XL. Zero-shot NoCaps val CIDEr (overall). ICML 2023. Salesforce.	verified	123.7	2023	Source ↗	Looks wrong?
05	BLIP-2 (OPT 2.7B) BLIP-2 with ViT-g + OPT 2.7B. Zero-shot NoCaps val CIDEr (overall). ICML 2023. Salesforce.	verified	121.6	2023	Source ↗	Looks wrong?
06	BLIP ViT-L	unverified	113.2	2022	Paper ↗Code ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Image Captioning