Codesota · Tasks · Video classificationHome/Tasks/Computer Vision/Video classification
Computer Vision· video-classification

Video classification.

The task of classifying videos into predefined categories or classes. Video classification involves analyzing temporal sequences of frames to understand the content and assign appropriate labels to entire video clips.

6
Datasets
13
Results
top-1-accuracy
Canonical metric
§ 02 · Canonical benchmark

The reference dataset.

Kinetics-400

Human action recognition across 400 action classes

Primary metric: top-1-accuracy
View full leaderboard →
§ 03 · Top 10

Leading models.

Leading models on Kinetics-400.

#ModelaccuracyYearSource
DINOv3 (7B)88.22025paper ↗
2VideoMAE ViT-H ↑32087.42022paper ↗
3V-JEPA 2 ViT-g (1B, 384px)87.32025paper ↗
4VideoPrism-g87.22024paper ↗
5DINOv2 (ViT-g/14)78.42023paper ↗

What were you looking for on Video classification?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

6 datasets tracked for this task.

Kinetics-400
CANONICAL
5 results · top-1-accuracy
Top: DINOv3 (7B) 88.2
Something-Something V2
5 results · top-1-accuracy
Top: V-JEPA 2 ViT-g (1B, 384px) 77.3
UCF-101
3 results · top-1-accuracy
Top: VideoMAE ViT-B 96.1
COIN
0 results
Diving-48
0 results
Epic-Kitchens-100 (EK100)
0 results
§ 05 · Related tasks

Other tasks in Computer Vision.

3D UnderstandingDepth estimationDocument Image ClassificationDocument Layout AnalysisDocument ParsingDocument UnderstandingGeneral OCR CapabilitiesHandwriting Recognition
Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Video classification? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.