Microsoft Common Objects in Context.

Microsoft COCO is the gold standard for large-scale object detection, segmentation, and captioning, with 330k+ images, 1.5M+ object instances, and 80 categories. Primary metric is box mAP averaged over 10 IoU thresholds (0.5:0.95).

Paper ↗Download dataset Submit a result ↵

§ 01 · Leaderboard

Best published scores.

24 results indexed across 2 metrics. Shaded row marks current SOTA; ties broken by submission date.

Primary: box-map · higher is better
All metrics: box-map, mAP

box-map· primary

11 rows

#	Model	Org	Submitted	Paper / code	box-map
01	ScyllaNetAPI	Scylla Technologies	Sep 2025	editorial	66.12
02	Thinker	UBTECH	Aug 2024	editorial	66
03	CW_Detection	Independent	Jan 2025	editorial	66
04	SenseTime BasemodelAPI	SenseTime	Nov 2024	editorial	66
05	InternImage-H (OneFormer)OSS	PJLab & Tsinghua	Mar 2024	InternImage: Exploring Large-Scale Vision Foundation Mod…	65.50
06	DINO-ViT-LOSS	IDEA-Research	Mar 2023	DINO: DETR with Improved DeNoising Anchor Boxes for End-…	63.30
07	ViT-Adapter-LOSS	Nanjing University	Nov 2022	Vision Transformer Adapter for Dense Predictions	60.50
08	Swin-L (Cascade R-CNN)OSS	Microsoft Research	Jul 2021	Swin Transformer: Hierarchical Vision Transformer using …	58.90
09	DETROSS	Meta AI / FAIR	May 2020	editorial	43.30
10	Mask R-CNNOSS	Meta AI / FAIR	Mar 2017	Mask R-CNN	39.80
11	Faster R-CNNOSS	Microsoft Research	Jun 2015	editorial	37.40

mAP

13 rows

#	Model	Org	Submitted	Paper / code	mAP
01	Co-DETR (Swin-L)OSS	Research	Mar 2026	arxiv	66
02	Co-DETR (Swin-L)OSS	Research	Dec 2025	arxiv-paper	66
03	InternImage-HOSS	Shanghai AI Lab	Dec 2025	arxiv-paper	65.40
04	InternImage-HOSS	Shanghai AI Lab	Mar 2026	arxiv	65.40
05	DINO (Swin-L)OSS	Research	Dec 2025	arxiv-paper	63.30
06	DINO (Swin-L)OSS	IDEA Research	Mar 2026	arxiv	63.30
07	Grounding DINOOSS	IDEA Research	Mar 2026	arxiv	63
08	EVA-02-LOSS	BAAI	Mar 2026	arxiv	62.30
09	YOLOv10-XOSS	Tsinghua University	Dec 2025	github-readme	57.40
10	EfficientDet-D7xOSS	Google	Dec 2025	google-research	55.10
11	YOLO11xOSS	Ultralytics	Mar 2026	official	54.70
12	YOLOv10-XOSS	Tsinghua University	Mar 2026	arxiv	54.40
13	RT-DETRv2-XOSS	Baidu	Mar 2026	arxiv	54.30

Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.

§ 03 · Progress

9 steps
of state of the art.

Each row below marks a model that broke the previous record on box-map. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · box-map

Jun 4, 2015Faster R-CNNMicrosoft Research37.40
Mar 20, 2017Mask R-CNNMeta AI / FAIR39.80
May 26, 2020DETRMeta AI / FAIR43.30
Jul 1, 2021Swin-L (Cascade R-CNN)Microsoft Research58.90
Nov 1, 2022ViT-Adapter-LNanjing University60.50
Mar 1, 2023DINO-ViT-LIDEA-Research63.30
Mar 1, 2024InternImage-H (OneFormer)PJLab & Tsinghua65.50
Aug 1, 2024ThinkerUBTECH66
Sep 1, 2025ScyllaNetScylla Technologies66.12

Fig 3 · SOTA-setting models only. 9 entries span Jun 2015 → Sep 2025.

§ 04 · Literature

5 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Nov 2022·InternImage-H (OneFormer)
arXiv ↗
Vision Transformer Adapter for Dense Predictions
May 2022·ViT-Adapter-L
arXiv ↗
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Mar 2022·DINO-ViT-L
arXiv ↗
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Mar 2021·Swin-L (Cascade R-CNN)
arXiv ↗
Mask R-CNN
Mar 2017·Mask R-CNN
arXiv ↗

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

Microsoft Common Objects in Context.

Best published scores.

9 stepsof state of the art.

5 paperstied to this benchmark.

Neighbouring benchmarks.

Have a score that beatsthis table?

9 steps
of state of the art.

5 papers
tied to this benchmark.

Have a score that beats
this table?