Computer Visionkeypoint-detection

Keypoint Detection

Keypoint detection localizes specific anatomical or structural landmarks — body joints, facial features, hand articulations — enabling pose estimation, gesture recognition, and motion capture. OpenPose (2017) first demonstrated real-time multi-person pose estimation, and the field has since progressed through HRNet, ViTPose, and RTMPose pushing both accuracy and speed. Modern systems detect 133 whole-body keypoints (body + hands + face) in real-time on mobile devices. The applications span from sports biomechanics (analyzing an athlete's form frame-by-frame) to sign language recognition and AR avatar puppeteering.

2
Datasets
3
Results
map
Canonical metric
Canonical Benchmark

COCO Keypoints

Human pose estimation on COCO with 17 body keypoints

Primary metric: map
View full leaderboard

Top 10

Leading models on COCO Keypoints.

RankModelapYearSource
1
ViTPose-H
80.92022paper
2
RTMPose-X
78.82023paper
3
HRNet-W48
75.52019paper

All datasets

2 datasets tracked for this task.

Related tasks

Other tasks in Computer Vision.

Run Inference

Looking to run a model? HuggingFace hosts inference for this task type.

HuggingFace