Model card
DINO-X.
IDEA Researchopen-sourceUnknown paramsUnified vision model with DINO-based detection head + large language model
Unified vision model for open-world object detection. Achieves 67.0 mask AP on LVIS v1.0 minival — SOTA at time of release (Nov 2024). Supports open-vocabulary and grounded detection. arXiv 2411.14347.
§ 01 · Benchmarks
Every benchmark DINO-X has a recorded score for.
| # | Benchmark | Area · Task | Metric | Value | Rank | Date | Source |
|---|---|---|---|---|---|---|---|
| 01 | LVIS v1.0 | Computer Vision · Object Detection | box-ap | 71.4% | #1 | 2024-11-21 | source ↗ |
| 02 | LVIS v1.0 | Computer Vision · Object Detection | mask-ap | 67.0% | #1 | 2024-11-21 | source ↗ |
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Papers
1 paper with results for DINO-X.
- 2024-11-21· Computer Vision· 2 results
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
§ 04 · Related models
Other IDEA Research models scored on Codesota.
§ 05 · Sources & freshness
Where these numbers come from.
arxiv
2
results
2 of 2 rows marked verified.