Computer Visiondepth-estimation

Depth Estimation

Depth estimation recovers 3D structure from 2D images — a problem that haunted computer vision for decades before deep learning cracked monocular depth prediction. The field shifted dramatically with MiDaS (2019) showing that mixing diverse training data beats task-specific models, then again with Depth Anything (2024) proving foundation model scale changes everything. Modern systems achieve sub-5% relative error on NYU Depth V2, but real-world robustness — handling reflections, transparency, and extreme lighting — remains the frontier. Critical for autonomous driving, AR/VR, and robotics where accurate 3D perception is non-negotiable.

2
Datasets
10
Results
abs-rel
Canonical metric
Canonical Benchmark

KITTI Depth

Outdoor depth estimation from autonomous driving LiDAR data

Primary metric: abs-rel
View full leaderboard

Top 10

Leading models on KITTI Depth.

RankModelabsrelYearSource
1
Marigold
0.0992023paper
2
MiDaS 3.1 (BEiT-512)
0.0582024paper
3
ZoeDepth-K
0.0532023paper
4
Depth Anything V1 (ViT-L)
0.0462024paper
5
Depth Anything V2 (ViT-L)
0.0402024paper

All datasets

2 datasets tracked for this task.

Related tasks

Other tasks in Computer Vision.

Run Inference

Looking to run a model? HuggingFace hosts inference for this task type.

HuggingFace