Who leads the COCO benchmark?

ScyllaNet currently leads COCO with a score of 66.12 on box-map.

What is the state-of-the-art score on COCO?

The state-of-the-art result on COCO is 66.12 (box-map), achieved by ScyllaNet as of 2026.

How many models are tracked on COCO?

Codesota tracks 71 models on COCO across 3 metrics.

When was the COCO leaderboard last updated?

The COCO leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2015.

COCO Leaderboard: Computer Vision Object Detection SOTA

Name: Microsoft Common Objects in Context Benchmark Results
Creator: Codesota
Published: 2015-01-01
License: https://creativecommons.org/licenses/by/4.0/

The Standard for Scene Understanding

Before COCO, datasets like PASCAL VOC focused on iconic views of objects. COCO shifted the paradigm toward contextual understanding. Images contain multiple objects, often small, occluded, or in complex backgrounds. This forced the development of Feature Pyramid Networks (FPN) and more robust backbones.

Scale Variation

Objects range from a few pixels to the entire frame, requiring multi-scale feature extraction.

Non-Iconic Views

Objects are shown in natural settings, often partially hidden or at unusual angles.

Evaluation Metric

Primary MetricAP

COCO uses Average Precision (AP) averaged over 10 IoU thresholds (0.50 to 0.95 with 0.05 steps). This rewards models with high localization accuracy.

• AP₅₀: AP at IoU=0.50
• AP_S: AP for small objects (< 32² px)
• AP_M: AP for medium objects
• AP_L: AP for large objects

SOTA Evolution

The journey from early CNNs to modern Vision Transformers.

mAP Score

Faster R-CNN: 37.4

2014

Two-stage paradigm

Mask R-CNN: 39.8

2017

Instance Segmentation

DETR: 43.3

2020

End-to-end Transformers

Swin-L: 58.9

2021

Hierarchical ViT

DINO: 63.3

2023

Improved Denoising

ScyllaNet: 66.1

2025

Current SOTA

Detection Leaderboard

Official Leaderboard ↗

Rank	Model	Organization	Date	AP
#1	ScyllaNet	Scylla Technologies	2025-09	66.1
#2	CW_Detection	Independent	2025-01	66.0
#3	SenseTime Basemodel	SenseTime	2024-11	66.0
#4	Thinker	UBTECH	2024-08	66.0
#5	InternImage-H (OneFormer)	PJLab & Tsinghua	2024-03	65.5
#6	DINO-ViT-L	IDEA-Research	2023-03	63.3
#7	ViT-Adapter-L	Nanjing University	2022-11	60.5
#8	Swin-L (Cascade R-CNN)	Microsoft Research	2021-07	58.9

Error Analysis

Most modern detectors struggle with False Positives on background textures and Localization Errors for small objects. COCO's analysis tools categorize errors into: Clutter, Similar Categories, and Poor Localization.

Speed vs. Accuracy

While SOTA models reach 60+ AP, they often run at < 5 FPS. Real-time models like YOLOv11 or RT-DETR target the 45-55 AP range while maintaining 100+ FPS on modern GPUs.

Dataset Variants

Active

COCO 2017 Core

118k Train / 5k Val

Standard object detection & segmentation benchmark.

Extension

COCO-Stuff

164k Images

Adds 91 "stuff" categories (sky, grass, wall) for semantic context.

Extension

COCO-Keypoints

250k People

Human pose estimation with 17 annotated keypoints.

Extension

COCO-Captions

330k Images

5 natural language descriptions per image for multimodal tasks.

Foundational Papers

Microsoft COCO: Common Objects in Context

Lin et al. • ECCV 2014

Mask R-CNN

He et al. • ICCV 2017

Swin Transformer: Hierarchical Vision Transformer

Liu et al. • ICCV 2021

InternImage: Exploring Large-Scale Vision Foundation Models

Wang et al. • CVPR 2023

Official Repositories

cocoapi6.4k

Official API for loading and evaluating COCO data.

detectron228k

Meta AI's next-gen library for object detection.

mmdetection26k

OpenMMLab detection toolbox with 300+ models.

Comparison with Other Benchmarks

Benchmark	Focus	Key Difference
LVIS	Long-tail recognition	1000+ categories; addresses class imbalance better than COCO.
PASCAL VOC	Early detection	Smaller scale (20 classes), mostly centered objects.
Open Images	Massive scale	9M images; uses image-level labels and bounding boxes.

Ready to Benchmark?

Download the COCO 2017 dataset and start training your models. Use the official API for standardized evaluation.

Download Dataset View COCO API