Computer Vision

Research focused on enabling computers to interpret and understand visual information from images and videos, including tasks such as image classification, object detection, segmentation, and visual recognition.

3 tasks97 datasets0 results

Tasks & Benchmarks

3D generation

0 benchmarks0 results

Few-Shot Image Classification

97 benchmarks0 results

Video generation

0 benchmarks0 results

Show all datasets and SOTA results

3D generation

No datasets indexed yet. Contribute on GitHub

Few-Shot Image Classification

COCO 2017 CaptionsMicrosoft COCO Captions (COCO 2017 Captions)

COCO 2017 Panoptic SegmentationMicrosoft COCO (Common Objects in Context) — 2017 Panoptic Segmentation

COCO 2017 StuffCOCO-Stuff (COCO 2017 Stuff / COCO-Stuff 164K)

COCO CaptionsCOCO Captions Dataset2015

COCO minivalCOCO minival Split2014

COCO test-challengeCOCO test-challenge Split2014

COCO val2017 (Instance Segmentation)COCO 2017 Object Detection (validation split)

COCO-StuffCommon Objects in COntext-stuff2018

COCO-TextCOCO-Text Dataset2016

COCO-WholeBodyCOCO-WholeBody Dataset2020

ChartMimic_v2_Direct

CodeForces (CodeElo)CodeElo

ComplexFuncBench

Crossmodal-3600 (XM3600)Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset

DL3DV-Benchmarks (140)DL3DV Benchmark (140 scenes)

Fluent Speech Commands

Free Music Archive (FMA) Small

HELMETHELMET: How to Evaluate Long-context Language Models Effectively and Thoroughly

HiRoomHiRoom

IMC (Image Matching Challenge)Image Matching Challenge — Phototourism (IMC-PT)

ImageNet-HardImageNet-Hard

LOFTLong-Context Frontiers

LVD-142MLVD-142M

LVD-1689MLVD-1689M

Language benchmarks (overall)Language benchmarks (overall)

LibriSpeech-100h

LibriSpeech-Male-Female

MMVPMMVP (Multimodal Visual Patterns) Benchmark

MRCRMulti-turn Response Coherence and Relevance

MegaDepth (19)MegaDepth

NSynth-Instruments

NTIRE 2024 Transparent Surface Challenge (relative)NTIRE 2024: HR Depth from Images of Specular and Transparent Surfaces (Booster Dataset) (Relative Depth)

OCRBench v2OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Online-Mind2Web

SAT-493MSAT-493M (Maxar 493M satellite imagery pretraining dataset)

SciVideoBenchSciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

SimpleQA-Verified

Speech Commands V1

TAP-Vid (RGB-S)TAP-Vid: A Benchmark for Tracking Any Point in a Video

Tanks and Temples (6)Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction

Vocal Imitation

WindowsAgentArena-V2

Video generation

No datasets indexed yet. Contribute on GitHub

Get notified when these results update

New models drop weekly. We track them so you don't have to.