Computer Vision

3D Understanding

Tasks related to understanding and processing 3D data, including 3D object detection, 3D reconstruction, multi-view estimation, and 3D scene understanding.

4 datasets0 resultsView full task mapping →

3D Understanding is a key task in computer vision. Below you will find the standard benchmarks used to evaluate models, along with current state-of-the-art results.

Benchmarks & SOTA

Re10K

RealEstate10K

0 results

RealEstate10K is a large-scale dataset for 3D understanding and reconstruction from real estate images.

No results tracked yet

CO3Dv2

Common Objects in 3D (CO3D) — version 2 (CO3Dv2)

0 results

CO3Dv2 is a dataset for 3D object understanding, containing common objects in 3D with multiple views.

No results tracked yet

ScanNet-1500

ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes (ScanNet)

0 results

ScanNet-1500 is a dataset for 3D scene understanding, containing richly-annotated 3D reconstructions of indoor scenes.

No results tracked yet

DTU

DTU Multi-View Stereo (MVS) Dataset (DTU Robot Image Data Sets)

0 results

DTU dataset for multi-view estimation and 3D reconstruction, containing multi-view stereo images.

No results tracked yet

Related Tasks

Few-Shot Image Classification

Image classification with limited labeled examples per class (few-shot learning). Models are evaluated on their ability to classify images into categories with only a handful of training examples (typically 1-10) per class.

Open-Vocabulary Object Detection

Object detection with open vocabulary - detecting objects from arbitrary text descriptions without being limited to a fixed set of categories.

Object counting

Object counting in AI is a computer vision task that uses machine learning and image processing to identify and enumerate distinct objects within digital images and videos. It can differentiate between various object types, sizes, and shapes, even in crowded or dynamically changing scenes. The process typically involves object detection using deep learning models like convolutional neural networks (CNNs) to recognize and localize objects, followed by aggregation to provide a total count. This technology is applied in fields like manufacturing for quality control and production monitoring.

Video segmentation

Video segmentation is the task of partitioning video frames into multiple segments or objects. Unlike image segmentation which works on static images, video segmentation tracks objects across frames in a video sequence.

Get notified when these results update

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep 3D Understanding benchmarks accurate. Report outdated results, missing benchmarks, or errors.

Back to Computer Vision