Optical Flow
Estimate pixel-wise motion between frames. Useful for video editing, stabilization, and robotics.
How Optical Flow Works
A deep-dive into optical flow: the art of perceiving motion from images. From classical variational methods to modern transformer architectures.
What is Optical Flow?
Optical flow captures apparent motion between consecutive video frames. For each pixel, it answers: "Where did this pixel move to in the next frame?"
Motion Between Frames
Click to toggle between frames. Notice how objects shift position.
The Flow Field
Optical flow produces a 2D vector field. Each arrow shows where that pixel "moves."
(H, W, 2) where each pixel has a (dx, dy) displacement vector.The Brightness Constancy Assumption
Classical optical flow methods rest on a fundamental assumption: a pixel's intensity does not change as it moves. If pixel (x, y) has intensity I at time t, then at time t+1, the same point (now at position x+dx, y+dy) should have the same intensity.
- - Lighting changes (shadows, reflections)
- - Object occlusion/disocclusion
- - Transparent or reflective surfaces
- - Motion blur
- - Large displacements between frames
- - Deep learning: learns robust features
- - Cost volumes: explicit matching
- - Multi-scale pyramids: handle large motion
- - Occlusion detection networks
Dense vs Sparse Optical Flow
Two fundamentally different approaches: compute flow for every pixel (dense) or only for selected keypoints (sparse).
Dense Flow
Compute flow for every pixel
Sparse Flow
Track selected feature points only
Dense Flow
- + Complete motion information
- + Smooth visualization
- + Better for segmentation
- - Computationally expensive
- - Slower inference
Visualizing Flow: The Color Wheel
Flow vectors are typically visualized using HSV color encoding: hue encodes direction, saturation/value encodes magnitude.
Direction Color Wheel
HSV Encoding Formula
Magnitude Encoding
Brighter colors indicate faster motion. Black or dark regions indicate little to no movement.
- - Red = moving right
- - Green = moving left
- - Yellow = moving down
- - Cyan = moving up
- - Dark = static / slow
- - Bright = fast motion
Methods: From Classical to Transformers
Four decades of progress: from variational calculus to attention mechanisms.
Lucas-Kanade (1981)
Horn-Schunck (1981)
Farneback (2003)
FlowNet (2015)
PWC-Net (2018)
RAFT (2020)
GMFlow (2022)
FlowFormer (2022)
VideoFlow (2023)
RAFT Architecture (Current Standard)
RAFT builds a 4D correlation volume at 1/8 resolution, then iteratively refines flow using a ConvGRU. 12-32 iterations at inference, converging to accurate flow even for large motions.
Applications
Optical flow is a fundamental building block for motion understanding.
Video Stabilization
Estimate camera motion from optical flow, then apply inverse transform to compensate. Flow reveals global motion patterns that distinguish camera shake from object motion.
Action Recognition
Flow provides motion features complementary to RGB appearance. Two-stream networks process RGB and flow separately, then fuse predictions.
Video Frame Interpolation
Generate intermediate frames by warping and blending based on bidirectional flow. Essential for slow-motion, frame rate conversion, and video compression.
Object Tracking
Propagate object masks or bounding boxes using flow-based motion estimation. Combines well with detection for robust multi-object tracking.
Code Examples
From OpenCV classical methods to state-of-the-art deep learning models.
import cv2
import numpy as np
# Read two consecutive frames
frame1 = cv2.imread('frame1.jpg')
frame2 = cv2.imread('frame2.jpg')
# Convert to grayscale
gray1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)
gray2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)
# Calculate dense optical flow (Farneback method)
flow = cv2.calcOpticalFlowFarneback(
gray1, gray2,
None,
pyr_scale=0.5, # Pyramid scale
levels=3, # Number of pyramid levels
winsize=15, # Window size
iterations=3, # Iterations per level
poly_n=5, # Polynomial expansion neighborhood
poly_sigma=1.2, # Gaussian std for derivatives
flags=0
)
# flow shape: (H, W, 2) - dx and dy per pixel
print(f'Flow shape: {flow.shape}')
print(f'Flow range: [{flow.min():.1f}, {flow.max():.1f}] pixels')
# Convert flow to HSV visualization
magnitude, angle = cv2.cartToPolar(flow[..., 0], flow[..., 1])
hsv = np.zeros_like(frame1)
hsv[..., 0] = angle * 180 / np.pi / 2 # Hue = direction
hsv[..., 1] = 255 # Saturation = max
hsv[..., 2] = cv2.normalize(magnitude, None, 0, 255, cv2.NORM_MINMAX)
flow_rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
cv2.imwrite('flow_visualization.png', flow_rgb)Quick Reference
- - cv2.calcOpticalFlowFarneback
- - cv2.DISOpticalFlow
- - cv2.calcOpticalFlowPyrLK
- - RAFT (torchvision)
- - FlowFormer
- - GMFlow
- - Sintel (synthetic)
- - KITTI (driving)
- - Spring (high-res)
Use Cases
- ✓Video stabilization
- ✓Motion-based tracking
- ✓AR occlusion handling
- ✓Autonomy perception
Architectural Patterns
Correlation Volume CNNs
Cost volumes with iterative refinement (RAFT/FlowNet2).
Transformer Flow
Global matching with attention (GMFlow/FlowFormer).
Implementations
Benchmarks
Quick Facts
- Input
- Image
- Output
- Structured Data
- Implementations
- 3 open source, 0 API
- Patterns
- 2 approaches