General

A broad category encompassing machine learning research and tasks that don't fit specifically into vision or language domains, including general ML methods, optimization, and cross-domain approaches.

11 tasks87 datasets8 results

Tasks & Benchmarks

Show all datasets and SOTA results

Video-Language Models

CG-Bench
CinePile
EgoLife
EgoSchema
MMVU
MMWorld
MVP
PLM-VideoBench
67.7(MBAcc)PLM (8B)
TOMATO
TempCompass
TemporalBench (MBA-short QA)
Video-MMLU2025
Video-MMMU
VideoHolmes

Coding Agents

CRUX-O
87.8(Pass@1)Qwen2.5-Plus
55.5(Pass@1)Qwen2.5-72B-Instruct
MBPP
88.2(Pass@1)Qwen2.5-72B-Instruct
MultiPL-E
77(Pass@1)Qwen2.5-Plus
SciCode

Embedding models

No datasets indexed yet. Contribute on GitHub

Omni models

DailyOmni
WorldSense

Reasoning

No datasets indexed yet. Contribute on GitHub

Reinforcement Learning

No datasets indexed yet. Contribute on GitHub

Retrieval

AmsterTime
BEIR
CodeSearchNet (CSN)
MLDR (English subset)
Revisited Paris (R_Par) — Medium split
StackOverflow-QA (StackQA)

Vision-Language Models

A12D
GQA
HallusionBench
IntelligentBench
M-LongDoc
MEGA-Bench (macro)
MM-Vet
MMBench-CN
MMBench-EN
MMBench-V1.1
MME
MMStar
MMT-Bench
MTVQA
Meta-World authors' collected dataset
NIH/Multi-needle
OlympiadBench (full)
OmniBench
RefCOCO2016
RefCOCO / RefCOCO+ / RefCOCOg (overall)
SO100 real-world: Pick-Place, Stacking, Sorting
SO101 real-world: Pick-Place-Lego
VCR-Wiki-EN-Easy
VCR-Wiki-ZH-Easy
VQAv2
Vibe-Eval
WISE

World Models

No datasets indexed yet. Contribute on GitHub

Computer Use Agents

MMB-GUI (MMBench-GUI)
OSW-G (OSWorld-G)
OSWorld (50 steps)
SSv2 (Screenshot-v2)
UI-V (UI-Vision)

Get notified when these results update

New models drop weekly. We track them so you don't have to.