Transfer Learning
Transferring knowledge between tasks and domains.
Transfer learning applies knowledge learned on one task (source) to improve performance on another (target) — the foundation of practical ML since 2018. Pretraining on large datasets followed by fine-tuning on task-specific data is the universal recipe, with parameter-efficient methods (LoRA, adapters) making transfer increasingly efficient.
History
Razavian et al. show pretrained CNN features transfer well to diverse vision tasks
ULMFiT (Howard & Ruder) demonstrates effective transfer learning for text classification
BERT and GPT establish pretrain-then-fine-tune as the standard NLP paradigm
GPT-3 shows in-context learning as a form of transfer without fine-tuning
LoRA (Low-Rank Adaptation) enables parameter-efficient fine-tuning of large models
Prefix tuning, adapters, and prompt tuning offer alternative efficient transfer methods
Foundation models make transfer learning the default — no one trains from scratch
QLoRA enables fine-tuning 65B models on single consumer GPUs
Domain-adapted foundation models (BioGPT, CodeLlama, MedSAM) show value of continued pretraining
Transfer learning is invisible — it's simply 'how ML works' in the foundation model era
How Transfer Learning Works
Source Pretraining
A large model is pretrained on massive, general-purpose data (internet text, ImageNet, web-scale images) to learn general representations.
Domain Adaptation (optional)
The pretrained model is further trained on domain-specific unlabeled data (medical text, code, satellite images) to adapt representations.
Task-Specific Fine-Tuning
The adapted model is fine-tuned on labeled target task data — either full fine-tuning or parameter-efficient methods (LoRA, adapters).
Few-Shot / Zero-Shot Alternative
For LLMs, transfer happens via in-context learning (few-shot examples in the prompt) without any parameter updates.
Evaluation
Compare fine-tuned performance to training from scratch — the gap quantifies transfer benefit.
Current Landscape
Transfer learning in 2025 is the foundation of all practical ML. The pretrain-then-fine-tune paradigm is so dominant that training from scratch is considered wasteful. Parameter-efficient fine-tuning (LoRA, QLoRA, adapters) has democratized transfer by enabling adaptation on consumer hardware. In-context learning provides zero-shot transfer for LLMs. The field has matured from 'should we transfer?' to 'how to transfer most efficiently?' — with LoRA rank, learning rate schedules, and data mixing being the key decisions.
Key Challenges
Negative transfer — when source and target domains are too dissimilar, transfer can hurt performance
Catastrophic forgetting during fine-tuning — the model loses general capabilities when specialized for a narrow task
Choosing what to transfer — which layers to freeze, which to fine-tune, and how many parameters to add
Compute cost — even efficient transfer (LoRA) requires significant resources for large foundation models
Data contamination — when evaluation data is in pretraining data, transfer learning claims are inflated
Quick Recommendations
LLM adaptation
LoRA / QLoRA fine-tuning
Best parameter-efficiency for adapting LLMs to new domains/tasks
Vision transfer
DINOv2 features + linear probing or fine-tuning
Best general visual features for transfer to any vision task
Low-resource tasks
In-context learning with GPT-4/Claude
No fine-tuning needed; works with 0-5 examples
Domain-specific NLP
Continued pretraining (Llama + domain text) + LoRA fine-tuning
Two-stage transfer: domain adaptation then task adaptation
What's Next
The frontier is universal transfer — foundation models that adapt to any downstream task with minimal data and compute. Expect model merging (combining LoRA adapters trained on different tasks), continual transfer (adapting to sequences of tasks without forgetting), and automated transfer (choosing the optimal pretraining source and adaptation strategy automatically).
Benchmarks & SOTA
No datasets indexed for this task yet.
Contribute on GitHubRelated Tasks
Something wrong or missing?
Help keep Transfer Learning benchmarks accurate. Report outdated results, missing benchmarks, or errors.