15K validation images from Open Images with 166K human-written captions. Specifically tests zero-shot generalization to novel objects not seen during training.
Cider is the reported evaluation metric for NoCaps. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better