Multimodal

Image Captioning

Generating text descriptions of images (COCO Captions).

1 datasets0 results

Image Captioning is a key task in multimodal. Below you will find the standard benchmarks used to evaluate models, along with current state-of-the-art results.

Benchmarks & SOTA

Related Tasks

Image Captioning Benchmarks - Multimodal - CodeSOTA | CodeSOTA