Video Classification
Video classification — recognizing actions and events in clips — extends image understanding into the temporal domain, requiring models to reason about motion, context, and temporal ordering. The field evolved from hand-crafted features (HOG, optical flow) through 3D CNNs (C3D, I3D) to video transformers like TimeSformer and VideoMAE that treat frames as spatiotemporal tokens. Kinetics-400 accuracy now exceeds 88%, but the real challenge is long-form video understanding where events unfold over minutes, not seconds. Essential for content moderation, sports analytics, and security applications.
Kinetics-400
Human action recognition across 400 action classes
Top 10
Leading models on Kinetics-400.
All datasets
3 datasets tracked for this task.
Related tasks
Other tasks in Computer Vision.
Looking to run a model? HuggingFace hosts inference for this task type.