Home/Building Blocks/Optical Flow
ImageStructured Data

Optical Flow

Estimate pixel-wise motion between frames. Useful for video editing, stabilization, and robotics.

How Optical Flow Works

A deep-dive into optical flow: the art of perceiving motion from images. From classical variational methods to modern transformer architectures.

1

What is Optical Flow?

Optical flow captures apparent motion between consecutive video frames. For each pixel, it answers: "Where did this pixel move to in the next frame?"

Motion Between Frames

Click to toggle between frames. Notice how objects shift position.

Frame t
Frame t+1
Blue Box
Moves right+down
Green Circle
Moves left
Orange Triangle
Static (no flow)

The Flow Field

Optical flow produces a 2D vector field. Each arrow shows where that pixel "moves."

Output format: For an image of size H x W, optical flow produces a tensor of shape (H, W, 2) where each pixel has a (dx, dy) displacement vector.

The Brightness Constancy Assumption

Classical optical flow methods rest on a fundamental assumption: a pixel's intensity does not change as it moves. If pixel (x, y) has intensity I at time t, then at time t+1, the same point (now at position x+dx, y+dy) should have the same intensity.

Brightness Constancy Equation:
I(x, y, t) = I(x + dx, y + dy, t + dt)
Linearized (Taylor expansion):
Ix*u + Iy*v + It = 0
where Ix, Iy, It are image gradients, u and v are flow components
When This Assumption Fails
  • - Lighting changes (shadows, reflections)
  • - Object occlusion/disocclusion
  • - Transparent or reflective surfaces
  • - Motion blur
  • - Large displacements between frames
Modern Solutions
  • - Deep learning: learns robust features
  • - Cost volumes: explicit matching
  • - Multi-scale pyramids: handle large motion
  • - Occlusion detection networks
2

Dense vs Sparse Optical Flow

Two fundamentally different approaches: compute flow for every pixel (dense) or only for selected keypoints (sparse).

Dense Flow

Every pixel has flow vector

Compute flow for every pixel

Sparse Flow

Only tracked keypoints

Track selected feature points only

Dense Flow

Output:
Full motion field (H x W x 2)
Advantages
  • + Complete motion information
  • + Smooth visualization
  • + Better for segmentation
Disadvantages
  • - Computationally expensive
  • - Slower inference
Example methods:
Farneback, FlowNet, RAFT, GMFlow
When to use: Video editing, segmentation, action recognition, any task needing complete motion information.
3

Visualizing Flow: The Color Wheel

Flow vectors are typically visualized using HSV color encoding: hue encodes direction, saturation/value encodes magnitude.

Direction Color Wheel

Right (0)Down (90)Left (180)Up (270)
Right (0deg)
Down (90deg)
Left (180deg)
Up (270deg)

HSV Encoding Formula

// Compute angle and magnitude
angle = atan2(dy, dx)
magnitude = sqrt(dx^2 + dy^2)
// Map to HSV
H = angle / (2*pi) * 360 // direction
S = 1.0 // full saturation
V = normalize(magnitude) // speed

Magnitude Encoding

Low to High

Brighter colors indicate faster motion. Black or dark regions indicate little to no movement.

Reading Flow Visualizations
  • - Red = moving right
  • - Green = moving left
  • - Yellow = moving down
  • - Cyan = moving up
  • - Dark = static / slow
  • - Bright = fast motion
4

Methods: From Classical to Transformers

Four decades of progress: from variational calculus to attention mechanisms.

Horn-Schunck
1981
ClassicalGlobal smoothness constraint
Lucas-Kanade
1981
ClassicalLocal window, sparse tracking
Farneback
2003
ClassicalPolynomial expansion, dense
FlowNet
2015
Deep LearningFirst end-to-end CNN
FlowNet2
2017
Deep LearningStacked networks, warping
PWC-Net
2018
Deep LearningPyramid, warping, cost volume
RAFT
2020
Deep LearningIterative refinement, all-pairs
GMFlow
2022
TransformerGlobal matching, cross-attention
FlowFormer
2022
TransformerCost volume tokenization
VideoFlow
2023
TransformerMulti-frame context

Lucas-Kanade (1981)

Sparse, local window method
Assumes constant flow within a small window. Solves overdetermined system via least squares. Fast but only tracks sparse points.

Horn-Schunck (1981)

Dense, global smoothness
Adds global smoothness constraint to brightness constancy. Produces dense flow but over-smooths motion boundaries.

Farneback (2003)

Polynomial expansion
Approximates each neighborhood by polynomial expansion. Good balance of speed and quality. OpenCV default dense method.

FlowNet (2015)

First end-to-end CNN
Pioneered learning-based optical flow. Two architectures: FlowNetS (simple) and FlowNetC (correlation layer).

PWC-Net (2018)

Pyramid, Warping, Cost volume
Coarse-to-fine with warping and learnable cost volume. Much smaller and faster than FlowNet2.

RAFT (2020)

Recurrent All-Pairs Field Transforms
All-pairs correlation at 1/8 resolution. GRU-based iterative refinement. State-of-the-art accuracy, strong generalization.

GMFlow (2022)

Global Matching with Transformers
Reformulates flow as global matching via cross-attention. Better handles large motions and occlusions.

FlowFormer (2022)

Tokenized cost volume
Treats cost volume as tokens. Transformer encoder-decoder for flow regression. Top benchmark results.

VideoFlow (2023)

Multi-frame temporal context
Uses multiple frames for more stable, consistent flow. Handles occlusions better by leveraging temporal information.

RAFT Architecture (Current Standard)

I1, I2
Images
->
Feature
Encoder
CNN backbone
->
Correlation
All-Pairs
4D cost volume
->
GRU
Iterations
Refinement
->
Flow
Output

RAFT builds a 4D correlation volume at 1/8 resolution, then iteratively refines flow using a ConvGRU. 12-32 iterations at inference, converging to accurate flow even for large motions.

5

Applications

Optical flow is a fundamental building block for motion understanding.

S
Video Stabilization
Compensate for camera shake
A
Action Recognition
Understand motion patterns
T
Object Tracking
Follow objects across frames
I
Video Interpolation
Generate intermediate frames
C
Video Compression
Motion-compensated prediction
D
Autonomous Driving
Ego-motion, obstacle detection

Video Stabilization

Estimate camera motion from optical flow, then apply inverse transform to compensate. Flow reveals global motion patterns that distinguish camera shake from object motion.

Pipeline: Compute flow -> Estimate homography/affine -> Smooth trajectory -> Warp frames

Action Recognition

Flow provides motion features complementary to RGB appearance. Two-stream networks process RGB and flow separately, then fuse predictions.

Architectures: Two-Stream CNN, I3D, SlowFast (flow computed on-the-fly or pre-extracted)

Video Frame Interpolation

Generate intermediate frames by warping and blending based on bidirectional flow. Essential for slow-motion, frame rate conversion, and video compression.

Methods: FILM, RIFE, IFRNet - often use flow as intermediate representation

Object Tracking

Propagate object masks or bounding boxes using flow-based motion estimation. Combines well with detection for robust multi-object tracking.

Use: Mask warping in video segmentation (XMem, Cutie), KLT for feature tracking
6

Code Examples

From OpenCV classical methods to state-of-the-art deep learning models.

OpenCV Farnebackpip install opencv-python
Classic Dense
import cv2
import numpy as np

# Read two consecutive frames
frame1 = cv2.imread('frame1.jpg')
frame2 = cv2.imread('frame2.jpg')

# Convert to grayscale
gray1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)
gray2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

# Calculate dense optical flow (Farneback method)
flow = cv2.calcOpticalFlowFarneback(
    gray1, gray2,
    None,
    pyr_scale=0.5,    # Pyramid scale
    levels=3,          # Number of pyramid levels
    winsize=15,        # Window size
    iterations=3,      # Iterations per level
    poly_n=5,          # Polynomial expansion neighborhood
    poly_sigma=1.2,    # Gaussian std for derivatives
    flags=0
)

# flow shape: (H, W, 2) - dx and dy per pixel
print(f'Flow shape: {flow.shape}')
print(f'Flow range: [{flow.min():.1f}, {flow.max():.1f}] pixels')

# Convert flow to HSV visualization
magnitude, angle = cv2.cartToPolar(flow[..., 0], flow[..., 1])
hsv = np.zeros_like(frame1)
hsv[..., 0] = angle * 180 / np.pi / 2  # Hue = direction
hsv[..., 1] = 255                       # Saturation = max
hsv[..., 2] = cv2.normalize(magnitude, None, 0, 255, cv2.NORM_MINMAX)
flow_rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)

cv2.imwrite('flow_visualization.png', flow_rgb)

Quick Reference

For Real-Time / Classic
  • - cv2.calcOpticalFlowFarneback
  • - cv2.DISOpticalFlow
  • - cv2.calcOpticalFlowPyrLK
For Best Accuracy
  • - RAFT (torchvision)
  • - FlowFormer
  • - GMFlow
Key Benchmarks
  • - Sintel (synthetic)
  • - KITTI (driving)
  • - Spring (high-res)

Use Cases

  • Video stabilization
  • Motion-based tracking
  • AR occlusion handling
  • Autonomy perception

Architectural Patterns

Correlation Volume CNNs

Cost volumes with iterative refinement (RAFT/FlowNet2).

Transformer Flow

Global matching with attention (GMFlow/FlowFormer).

Implementations

Open Source

RAFT

BSD 3-Clause
Open Source

State-of-the-art accuracy and speed.

GMFlow

MIT
Open Source

Transformer-based dense matching.

FlowFormer

Apache 2.0
Open Source

High-quality long-range flow.

Benchmarks

Quick Facts

Input
Image
Output
Structured Data
Implementations
3 open source, 0 API
Patterns
2 approaches

Have benchmark data?

Help us track the state of the art for optical flow.

Submit Results