3D Understanding
Tasks related to understanding and processing 3D data, including 3D object detection, 3D reconstruction, multi-view estimation, and 3D scene understanding.
3D Understanding is a key task in computer vision. Below you will find the standard benchmarks used to evaluate models, along with current state-of-the-art results.
Benchmarks & SOTA
Re10K
RealEstate10K
RealEstate10K is a large-scale dataset for 3D understanding and reconstruction from real estate images.
No results tracked yet
CO3Dv2
Common Objects in 3D (CO3D) — version 2 (CO3Dv2)
CO3Dv2 is a dataset for 3D object understanding, containing common objects in 3D with multiple views.
No results tracked yet
ScanNet-1500
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes (ScanNet)
ScanNet-1500 is a dataset for 3D scene understanding, containing richly-annotated 3D reconstructions of indoor scenes.
No results tracked yet
DTU
DTU Multi-View Stereo (MVS) Dataset (DTU Robot Image Data Sets)
DTU dataset for multi-view estimation and 3D reconstruction, containing multi-view stereo images.
No results tracked yet
Related Tasks
Few-Shot Image Classification
Image classification with limited labeled examples per class (few-shot learning). Models are evaluated on their ability to classify images into categories with only a handful of training examples (typically 1-10) per class.
Open-Vocabulary Object Detection
Object detection with open vocabulary - detecting objects from arbitrary text descriptions without being limited to a fixed set of categories.
Object counting
Object counting in AI is a computer vision task that uses machine learning and image processing to identify and enumerate distinct objects within digital images and videos. It can differentiate between various object types, sizes, and shapes, even in crowded or dynamically changing scenes. The process typically involves object detection using deep learning models like convolutional neural networks (CNNs) to recognize and localize objects, followed by aggregation to provide a total count. This technology is applied in fields like manufacturing for quality control and production monitoring.
Video segmentation
Video segmentation is the task of partitioning video frames into multiple segments or objects. Unlike image segmentation which works on static images, video segmentation tracks objects across frames in a video sequence.
Get notified when these results update
New models drop weekly. We track them so you don't have to.
Something wrong or missing?
Help keep 3D Understanding benchmarks accurate. Report outdated results, missing benchmarks, or errors.