Other
Other tasks that don't fit into specific categories
Other is a key task in other. Below you will find the standard benchmarks used to evaluate models, along with current state-of-the-art results.
Benchmarks & SOTA
NAVI
NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
NAVI is a category-agnostic multi-view image collection dataset with high-quality 3D scans and precise 2D–3D alignments (per-image camera poses). It was created to enable systematic evaluation of image-based 3D reconstruction, multi-view geometric correspondence, and surface-level/keypoint correspondence tasks from casual (in-the-wild) image collections where traditional SfM often fails. NAVI provides object-centric image collections paired with near-perfect ground-truth camera parameters and 3D shapes, enabling extraction of accurate cross-view correspondences and evaluation following protocols such as Probe3D. Primary resources: NAVI project site (https://navidataset.github.io/), NeurIPS 2023 paper and supplemental materials, and the google/navi GitHub repository.
No results tracked yet
SPair
SPair-71k: A Large-scale Benchmark for Semantic Correspondence
SPair-71k is a large-scale benchmark dataset for semantic correspondence (semantic keypoint matching) introduced by Min et al. (2019). It contains 70,958 semantically paired images with large intra-class variations in viewpoint and scale and provides accurate, rich annotations intended for evaluating semantic correspondence methods. Annotations include per-image-pair semantic keypoint correspondences, bounding boxes, segmentation masks and metadata about viewpoint/scale variation, truncation and occlusion. The dataset is commonly used as a testbed for semantic keypoint/correspondence and matching algorithms and is distributed with a project page and an arXiv preprint (arXiv:1908.10543). A Hugging Face dataset mirror is also available.
No results tracked yet
Revisited Oxford (R_Ox) — Medium split
Revisited Oxford (R-Oxford / Roxford5k) — Medium split
Revisited Oxford (R-Oxford, also referred to as Roxford5k) is the corrected/re-annotated version of the classic Oxford Buildings image retrieval benchmark introduced in “Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking” (Radenović et al., CVPR 2018 / arXiv:1803.11285). The authors provide revised ground-truth annotations (including bounding boxes and an updated query list: the 55 original queries plus 15 new challenging queries = 70 queries), three evaluation protocols of different difficulty (Easy / Medium / Hard), and an optional R1M set of hard distractor images for large-scale testing. The “Medium” split is the medium-difficulty evaluation protocol from this benchmark (i.e., the dataset subset/protocol used when reporting Medium-difficulty mAP in papers). The dataset is widely used for instance-level image retrieval / landmark retrieval evaluation; the authors publish the images (original Oxford images) and the revisited annotation files (e.g. gnd_roxford5k.mat) and provide code and downloads from the project page.
No results tracked yet
Natural Questions
Natural Questions (NQ)
Natural Questions (NQ) is a large question-answering corpus released by Google Research. Questions are real anonymized, aggregated queries issued to the Google search engine. For each question, annotators are given a selected Wikipedia page (from the top search results) and label a long answer (typically a paragraph) and, if present, a short answer (one or more entities or spans); pages can also be labeled null when no answer is present. NQ is intended to require reading and comprehension of entire Wikipedia articles and is used for open-domain / reading-comprehension QA research. The public release contains on the order of a few hundred thousand examples (commonly cited: ~307k training examples) and is English-only. License: CC BY-SA 3.0.
No results tracked yet
Related Tasks
Get notified when these results update
New models drop weekly. We track them so you don't have to.
Something wrong or missing?
Help keep Other benchmarks accurate. Report outdated results, missing benchmarks, or errors.