Methodology

Continual Learning

Learning new tasks without forgetting old ones.

1 datasets0 resultsView full task mapping →

Continual learning (lifelong learning) enables models to learn from non-stationary data streams without forgetting previous knowledge — the fundamental challenge of catastrophic forgetting. Replay-based methods, parameter isolation, and regularization are the three paradigms, with experience replay remaining the most practical approach.

History

1989

McCloskey & Cohen document catastrophic interference in neural networks

2017

EWC (Elastic Weight Consolidation) by Kirkpatrick et al. — regularization-based continual learning

2017

PackNet and Progressive Neural Networks use parameter isolation for task-specific subnetworks

2019

Experience Replay variants (ER, MER, A-GEM) show simple replay buffers are highly effective

2020

DER++ combines knowledge distillation with experience replay for strong baseline

2021

CoRe50 and CORe benchmarks standardize continual learning evaluation

2022

CLS-ER (Complementary Learning Systems) bridges neuroscience theory with replay methods

2023

Continual pretraining of LLMs becomes practically important — how to update without forgetting

2024

CLIP-based continual learning leverages pretrained features for zero-shot class incremental learning

2025

Foundation model fine-tuning as a continual learning problem — LoRA, adapters, and progressive specialization

How Continual Learning Works

Task Arrival

New data arrives sequentially — new classes, new domains, or distribution shifts — without access to all previous data simultaneously.

Replay / Regularization

To prevent forgetting: (1) replay stored examples from previous tasks, (2) regularize weights to stay near previous solutions (EWC), or (3) freeze/isolate task-specific parameters.

Knowledge Consolidation

New knowledge is integrated with existing knowledge through distillation, feature alignment, or complementary learning systems.

Evaluation

Performance is measured on all tasks seen so far — both backward transfer (impact on old tasks) and forward transfer (benefit to new tasks).

Current Landscape

Continual learning in 2025 is gaining practical importance as foundation models need updating without catastrophic forgetting. The research community has established that simple replay methods (ER, DER++) are strong baselines that are hard to beat consistently. The field is shifting from toy benchmarks (permuted MNIST) to practical scenarios: continual pretraining of LLMs, incremental class learning with pretrained features, and domain adaptation over time. The connection to foundation model fine-tuning (LoRA, adapters) is creating new synergies between continual learning theory and practical model management.

Key Challenges

Catastrophic forgetting — learning new tasks overwrites weights important for old tasks

Task boundary assumption — many methods assume clear task boundaries, which are absent in real-world data streams

Memory constraints — replay buffers consume memory that scales with the number of tasks

Evaluation complexity — comparing methods requires standardized task sequences, which are hard to design fairly

Scalability — most continual learning research uses small datasets (CIFAR, MNIST); scaling to foundation models is different

Quick Recommendations

Practical continual learning

Experience Replay (ER / DER++)

Simple, effective, and consistently competitive across benchmarks

Foundation model updating

LoRA + selective replay

Efficient fine-tuning that preserves base model capabilities

Memory-constrained

EWC / SI (Synaptic Intelligence)

Regularization-based approaches require no replay buffer

Class-incremental learning

FOSTER / MEMO with CLIP features

Leverages pretrained features for incremental class learning

What's Next

The frontier is continual learning for foundation models — updating LLMs, VLMs, and other large models with new knowledge without degrading existing capabilities. Expect methods that combine efficient fine-tuning (LoRA) with intelligent replay and knowledge consolidation, and practical tools for managing model versions across continuous data streams.

Benchmarks & SOTA

Split CIFAR-100

Split CIFAR-100 (10 tasks x 10 classes, class-incremental)

20170 results

Canonical class-incremental continual learning benchmark: CIFAR-100 is split into 10 sequential tasks of 10 classes each. Models learn tasks one at a time without access to prior-task data and are evaluated on average accuracy across all tasks after the full sequence.

No results tracked yet

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep Continual Learning benchmarks accurate. Report outdated results, missing benchmarks, or errors.

Back to Methodology