Codesota · Tasks · Semantic SegmentationHome/Tasks/Computer Vision/Semantic Segmentation

Semantic Segmentation.

Semantic segmentation assigns a class label to every pixel — the dense prediction problem that underpins autonomous driving, medical imaging, and satellite analysis. FCN (2015) showed you could repurpose classifiers for pixel labeling, DeepLab introduced atrous convolutions and CRFs, and SegFormer (2021) proved transformers dominate here too. State-of-the-art on Cityscapes exceeds 85 mIoU, but ADE20K with its 150 classes remains brutally challenging. The frontier has moved toward universal segmentation models like Mask2Former that handle semantic, instance, and panoptic segmentation in a single architecture.

Datasets

Results

mIoU

Canonical metric

§ 02 · Canonical benchmark

The reference dataset.

ADE20K

20K training, 2K validation images annotated with 150 object categories. Complex scene parsing benchmark.

Primary metric: mIoU

View full leaderboard →

§ 03 · Top 10

Leading models.

Leading models on ADE20K.

#	Model	mIoU	Year	Source
★	InternImage-H	62.9	2025	paper ↗
2	BEiT-3 (ViT-L)	62.8	2026	paper ↗
3	DINOv3 + Mask2Former (simple)	62.6	2025	paper ↗
4	DINOv2 (ViT-g) + Linear	62.0	2026	paper ↗
5	EoMT (ViT-L)	58.4	2025	paper ↗
6	BEiT-L+	57.9	2021	paper ↗
7	Mask2Former (Swin-L)	57.3	2025	paper ↗
8	Mask2Former (Swin-L)	57.3	2026	paper ↗
9	OneFormer (Swin-L)	57.0	2022	paper ↗
10	Mask2Former + Swin-L-FaPN	56.4	2021	paper ↗