Methodology

Transfer Learning

Transferring knowledge between tasks and domains.

0 datasets0 resultsView full task mapping →

Transfer learning applies knowledge learned on one task (source) to improve performance on another (target) — the foundation of practical ML since 2018. Pretraining on large datasets followed by fine-tuning on task-specific data is the universal recipe, with parameter-efficient methods (LoRA, adapters) making transfer increasingly efficient.

History

2014

Razavian et al. show pretrained CNN features transfer well to diverse vision tasks

2018

ULMFiT (Howard & Ruder) demonstrates effective transfer learning for text classification

2018

BERT and GPT establish pretrain-then-fine-tune as the standard NLP paradigm

2020

GPT-3 shows in-context learning as a form of transfer without fine-tuning

2021

LoRA (Low-Rank Adaptation) enables parameter-efficient fine-tuning of large models

2021

Prefix tuning, adapters, and prompt tuning offer alternative efficient transfer methods

2022

Foundation models make transfer learning the default — no one trains from scratch

2023

QLoRA enables fine-tuning 65B models on single consumer GPUs

2024

Domain-adapted foundation models (BioGPT, CodeLlama, MedSAM) show value of continued pretraining

2025

Transfer learning is invisible — it's simply 'how ML works' in the foundation model era

How Transfer Learning Works

1Source PretrainingA large model is pretrained…2Domain Adaptation (op…The pretrained model is fur…3Task-Specific Fine-Tu…The adapted model is fine-t…4Few-Shot / Zero-Shot …For LLMs5EvaluationCompare fine-tuned performa…Transfer Learning Pipeline
1

Source Pretraining

A large model is pretrained on massive, general-purpose data (internet text, ImageNet, web-scale images) to learn general representations.

2

Domain Adaptation (optional)

The pretrained model is further trained on domain-specific unlabeled data (medical text, code, satellite images) to adapt representations.

3

Task-Specific Fine-Tuning

The adapted model is fine-tuned on labeled target task data — either full fine-tuning or parameter-efficient methods (LoRA, adapters).

4

Few-Shot / Zero-Shot Alternative

For LLMs, transfer happens via in-context learning (few-shot examples in the prompt) without any parameter updates.

5

Evaluation

Compare fine-tuned performance to training from scratch — the gap quantifies transfer benefit.

Current Landscape

Transfer learning in 2025 is the foundation of all practical ML. The pretrain-then-fine-tune paradigm is so dominant that training from scratch is considered wasteful. Parameter-efficient fine-tuning (LoRA, QLoRA, adapters) has democratized transfer by enabling adaptation on consumer hardware. In-context learning provides zero-shot transfer for LLMs. The field has matured from 'should we transfer?' to 'how to transfer most efficiently?' — with LoRA rank, learning rate schedules, and data mixing being the key decisions.

Key Challenges

Negative transfer — when source and target domains are too dissimilar, transfer can hurt performance

Catastrophic forgetting during fine-tuning — the model loses general capabilities when specialized for a narrow task

Choosing what to transfer — which layers to freeze, which to fine-tune, and how many parameters to add

Compute cost — even efficient transfer (LoRA) requires significant resources for large foundation models

Data contamination — when evaluation data is in pretraining data, transfer learning claims are inflated

Quick Recommendations

LLM adaptation

LoRA / QLoRA fine-tuning

Best parameter-efficiency for adapting LLMs to new domains/tasks

Vision transfer

DINOv2 features + linear probing or fine-tuning

Best general visual features for transfer to any vision task

Low-resource tasks

In-context learning with GPT-4/Claude

No fine-tuning needed; works with 0-5 examples

Domain-specific NLP

Continued pretraining (Llama + domain text) + LoRA fine-tuning

Two-stage transfer: domain adaptation then task adaptation

What's Next

The frontier is universal transfer — foundation models that adapt to any downstream task with minimal data and compute. Expect model merging (combining LoRA adapters trained on different tasks), continual transfer (adapting to sequences of tasks without forgetting), and automated transfer (choosing the optimal pretraining source and adaptation strategy automatically).

Benchmarks & SOTA

No datasets indexed for this task yet.

Contribute on GitHub

Related Tasks

Something wrong or missing?

Help keep Transfer Learning benchmarks accurate. Report outdated results, missing benchmarks, or errors.

0/2000