DINO-X.

IDEA Researchopen-sourceUnknown paramsUnified vision model with DINO-based detection head + large language model

Unified vision model for open-world object detection. Achieves 67.0 mask AP on LVIS v1.0 minival — SOTA at time of release (Nov 2024). Supports open-vocabulary and grounded detection. arXiv 2411.14347.

§ 02 · Benchmarks

Every benchmark DINO-X has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	LVIS v1.0	Computer Vision · Object Detection	box-ap	71.4%	#1/4	2024-11-21	source ↗
02	LVIS v1.0	Computer Vision · Object Detection	mask-ap	67.0%	#1/9	2024-11-21	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 03 · Strengths by area