Image Segmentation
Classify each pixel in an image. Enables precise object boundaries for medical imaging, autonomous vehicles, and image editing.
How Image Segmentation Works
A technical deep-dive into image segmentation. From pixel-level classification to SAM's promptable foundation model.
Segmentation Types
Three main types: semantic (what class), instance (which object), and panoptic (both).
Semantic Segmentation
Label every pixel with a class
Instance Segmentation
Distinguish individual objects
Panoptic Segmentation
Semantic + Instance combined
| Aspect | Semantic | Instance | Panoptic |
|---|---|---|---|
| Distinguishes instances? | No | Yes | Yes |
| Background classes? | Yes | No | Yes |
| Overlapping masks? | No | Yes | No |
| Main metric | mIoU | AP | PQ |
Architecture Evolution
From FCN to SAM 2: a decade of progress in segmentation architectures.
Encoder-Decoder (U-Net style)
Downsampling captures context, upsampling recovers spatial detail. Skip connections preserve fine features.
Transformer-Based (SAM style)
Pre-computed image embeddings + lightweight prompt encoding enables real-time interactive segmentation.
SAM: Segment Anything Model
Meta's foundation model for segmentation. Trained on 11M images and 1.1B masks. Promptable - segment anything with points, boxes, or text.
SAM Architecture
Prompt Types
SAM (2023)
Image only- +Zero-shot transfer to any domain
- +Real-time with pre-computed embeddings
- +Ambiguity-aware (3 mask outputs)
- -No video/temporal support
SAM 2 (2024)
Image + Video- +Unified image and video model
- +Memory mechanism for tracking
- +6x faster than SAM
- +Streaming architecture
How SAM Works
Mask Formats & Representation
How segmentation masks are stored and encoded.
Run-Length Encoding (RLE)
COCO dataset uses RLE to compress binary masks efficiently. Stores runs of consecutive values.
{"counts": [3, 4, 2, 6, 2, 6, 3, 4, 2], "size": [4, 8]}Common Mask Operations
Segmentation Metrics
How to measure segmentation quality.
IoU (Intersection over Union) for Masks
ADE20K Semantic Segmentation Leaderboard
| Model | Backbone | mIoU (val) | Year |
|---|---|---|---|
| InternImage-H | InternImage-H | 62.9% | 2023 |
| Mask2Former | Swin-L | 57.3% | 2022 |
| SegFormer | MiT-B5 | 51.8% | 2021 |
| DeepLab v3+ | ResNet-101 | 45.7% | 2018 |
Code Examples
Get started with segmentation in Python.
from segment_anything import sam_model_registry, SamPredictor
import cv2
# Load SAM model
sam = sam_model_registry['vit_h'](checkpoint='sam_vit_h.pth')
predictor = SamPredictor(sam)
# Load and set image
image = cv2.imread('image.jpg')
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
predictor.set_image(image_rgb)
# Prompt with point (x, y) and label (1=foreground)
input_point = np.array([[500, 375]])
input_label = np.array([1])
# Generate mask
masks, scores, logits = predictor.predict(
point_coords=input_point,
point_labels=input_label,
multimask_output=True
)
# Best mask
best_mask = masks[np.argmax(scores)]Quick Reference
- - SAM / SAM 2
- - Grounded SAM
- - SEEM
- - Mask2Former
- - OneFormer
- - SegFormer
- - SAM 2
- - XMem
- - DEVA
Use Cases
- ✓Medical image analysis
- ✓Autonomous driving
- ✓Background removal
- ✓Satellite imagery analysis
Architectural Patterns
Semantic Segmentation
Classify every pixel into categories (no instance distinction).
- +Dense predictions
- +Well-suited for scene parsing
- -Doesn't separate instances
- -Needs full annotations
Instance Segmentation
Segment and distinguish individual object instances.
- +Separates objects
- +Combines detection + segmentation
- -More complex
- -Higher compute cost
Panoptic Segmentation
Unified semantic + instance segmentation.
- +Complete scene understanding
- +Both stuff and things
- -Most complex
- -Requires rich annotations
Implementations
Open Source
Segment Anything (SAM)
Apache 2.0Zero-shot segmentation. Point or box prompts. Revolutionary.
nnU-Net
Apache 2.0Self-configuring for medical imaging. Top performer on many challenges.
Benchmarks
Quick Facts
- Input
- Image
- Output
- Segmentation Mask
- Implementations
- 5 open source, 0 API
- Patterns
- 3 approaches