Image→Structured Data

Optical Flow

Estimate pixel-wise motion between frames. Useful for video editing, stabilization, and robotics.

How Optical Flow Works

A deep-dive into optical flow: the art of perceiving motion from images. From classical variational methods to modern transformer architectures.

1. What is Flow 2. Dense vs Sparse 3. Visualization 4. Methods 5. Applications 6. Code

What is Optical Flow?

Optical flow captures apparent motion between consecutive video frames. For each pixel, it answers: "Where did this pixel move to in the next frame?"

Motion Between Frames

Click to toggle between frames. Notice how objects shift position.

Frame t

Frame t+1

Blue Box

Moves right+down

Green Circle

Moves left

Orange Triangle

Static (no flow)

The Flow Field

Optical flow produces a 2D vector field. Each arrow shows where that pixel "moves."

Output format: For an image of size H x W, optical flow produces a tensor of shape (H, W, 2) where each pixel has a (dx, dy) displacement vector.

The Brightness Constancy Assumption

Classical optical flow methods rest on a fundamental assumption: a pixel's intensity does not change as it moves. If pixel (x, y) has intensity I at time t, then at time t+1, the same point (now at position x+dx, y+dy) should have the same intensity.

Brightness Constancy Equation:

I(x, y, t) = I(x + dx, y + dy, t + dt)

Linearized (Taylor expansion):

Ix*u + Iy*v + It = 0

where Ix, Iy, It are image gradients, u and v are flow components

When This Assumption Fails

- Lighting changes (shadows, reflections)
- Object occlusion/disocclusion
- Transparent or reflective surfaces
- Motion blur
- Large displacements between frames

Modern Solutions

- Deep learning: learns robust features
- Cost volumes: explicit matching
- Multi-scale pyramids: handle large motion
- Occlusion detection networks

Dense vs Sparse Optical Flow

Two fundamentally different approaches: compute flow for every pixel (dense) or only for selected keypoints (sparse).

Dense Flow

Every pixel has flow vector

Compute flow for every pixel

Sparse Flow

Only tracked keypoints

Track selected feature points only

Dense Flow

Output:

Full motion field (H x W x 2)

Advantages

+ Complete motion information
+ Smooth visualization
+ Better for segmentation

Disadvantages

- Computationally expensive
- Slower inference

Example methods:

Farneback, FlowNet, RAFT, GMFlow

When to use: Video editing, segmentation, action recognition, any task needing complete motion information.

Visualizing Flow: The Color Wheel

Flow vectors are typically visualized using HSV color encoding: hue encodes direction, saturation/value encodes magnitude.

Direction Color Wheel

Right (0deg)

Down (90deg)

Left (180deg)

Up (270deg)

HSV Encoding Formula

// Compute angle and magnitude

angle = atan2(dy, dx)

magnitude = sqrt(dx^2 + dy^2)

// Map to HSV

H = angle / (2*pi) * 360 // direction

S = 1.0 // full saturation

V = normalize(magnitude) // speed

Magnitude Encoding

Low to High

Brighter colors indicate faster motion. Black or dark regions indicate little to no movement.

Reading Flow Visualizations

- Red = moving right
- Green = moving left
- Yellow = moving down
- Cyan = moving up
- Dark = static / slow
- Bright = fast motion

Methods: From Classical to Transformers

Four decades of progress: from variational calculus to attention mechanisms.

Horn-Schunck

1981

ClassicalGlobal smoothness constraint

Lucas-Kanade

1981

ClassicalLocal window, sparse tracking

Farneback

2003

ClassicalPolynomial expansion, dense

FlowNet

2015

Deep LearningFirst end-to-end CNN

FlowNet2

2017

Deep LearningStacked networks, warping

PWC-Net

2018

Deep LearningPyramid, warping, cost volume

RAFT

2020

Deep LearningIterative refinement, all-pairs

GMFlow

2022

TransformerGlobal matching, cross-attention

FlowFormer

2022

TransformerCost volume tokenization

VideoFlow

2023

TransformerMulti-frame context

Lucas-Kanade (1981)

Sparse, local window method

Assumes constant flow within a small window. Solves overdetermined system via least squares. Fast but only tracks sparse points.

Horn-Schunck (1981)

Dense, global smoothness

Adds global smoothness constraint to brightness constancy. Produces dense flow but over-smooths motion boundaries.

Farneback (2003)

Polynomial expansion

Approximates each neighborhood by polynomial expansion. Good balance of speed and quality. OpenCV default dense method.

FlowNet (2015)

First end-to-end CNN

Pioneered learning-based optical flow. Two architectures: FlowNetS (simple) and FlowNetC (correlation layer).

PWC-Net (2018)

Pyramid, Warping, Cost volume

Coarse-to-fine with warping and learnable cost volume. Much smaller and faster than FlowNet2.

RAFT (2020)

Recurrent All-Pairs Field Transforms

All-pairs correlation at 1/8 resolution. GRU-based iterative refinement. State-of-the-art accuracy, strong generalization.

GMFlow (2022)

Global Matching with Transformers

Reformulates flow as global matching via cross-attention. Better handles large motions and occlusions.

FlowFormer (2022)

Tokenized cost volume

Treats cost volume as tokens. Transformer encoder-decoder for flow regression. Top benchmark results.

VideoFlow (2023)

Multi-frame temporal context

Uses multiple frames for more stable, consistent flow. Handles occlusions better by leveraging temporal information.

RAFT Architecture (Current Standard)

I1, I2

Images

Feature

Encoder

CNN backbone

Correlation

All-Pairs

4D cost volume

GRU

Iterations

Refinement

Flow

Output

RAFT builds a 4D correlation volume at 1/8 resolution, then iteratively refines flow using a ConvGRU. 12-32 iterations at inference, converging to accurate flow even for large motions.

Applications

Optical flow is a fundamental building block for motion understanding.

Video Stabilization

Compensate for camera shake

Action Recognition

Understand motion patterns

Object Tracking

Follow objects across frames

Video Interpolation

Generate intermediate frames

Video Compression

Motion-compensated prediction

Autonomous Driving

Ego-motion, obstacle detection

Video Stabilization

Estimate camera motion from optical flow, then apply inverse transform to compensate. Flow reveals global motion patterns that distinguish camera shake from object motion.

Pipeline: Compute flow -> Estimate homography/affine -> Smooth trajectory -> Warp frames

Action Recognition

Flow provides motion features complementary to RGB appearance. Two-stream networks process RGB and flow separately, then fuse predictions.

Architectures: Two-Stream CNN, I3D, SlowFast (flow computed on-the-fly or pre-extracted)

Video Frame Interpolation

Generate intermediate frames by warping and blending based on bidirectional flow. Essential for slow-motion, frame rate conversion, and video compression.

Methods: FILM, RIFE, IFRNet - often use flow as intermediate representation

Object Tracking

Propagate object masks or bounding boxes using flow-based motion estimation. Combines well with detection for robust multi-object tracking.

Use: Mask warping in video segmentation (XMem, Cutie), KLT for feature tracking

Code Examples

From OpenCV classical methods to state-of-the-art deep learning models.

OpenCV Farnebackpip install opencv-python

Classic Dense

import cv2
import numpy as np

# Read two consecutive frames
frame1 = cv2.imread('frame1.jpg')
frame2 = cv2.imread('frame2.jpg')

# Convert to grayscale
gray1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)
gray2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

# Calculate dense optical flow (Farneback method)
flow = cv2.calcOpticalFlowFarneback(
    gray1, gray2,
    None,
    pyr_scale=0.5,    # Pyramid scale
    levels=3,          # Number of pyramid levels
    winsize=15,        # Window size
    iterations=3,      # Iterations per level
    poly_n=5,          # Polynomial expansion neighborhood
    poly_sigma=1.2,    # Gaussian std for derivatives
    flags=0
)

# flow shape: (H, W, 2) - dx and dy per pixel
print(f'Flow shape: {flow.shape}')
print(f'Flow range: [{flow.min():.1f}, {flow.max():.1f}] pixels')

# Convert flow to HSV visualization
magnitude, angle = cv2.cartToPolar(flow[..., 0], flow[..., 1])
hsv = np.zeros_like(frame1)
hsv[..., 0] = angle * 180 / np.pi / 2  # Hue = direction
hsv[..., 1] = 255                       # Saturation = max
hsv[..., 2] = cv2.normalize(magnitude, None, 0, 255, cv2.NORM_MINMAX)
flow_rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)

cv2.imwrite('flow_visualization.png', flow_rgb)