Few-Shot Learning is Dead.
Long Live Foundation Models.
For a decade, the few-shot learning community built intricate meta-learning algorithms to classify with minimal examples. Then GPT-3 put examples in a prompt and matched their results. CLIP classified images it had never seen. The field didn't die overnight, but the cause of death is now clear.
The Premise
Few-shot learning as a dedicated research field is being absorbed into foundation model capabilities. The specialized methods — metric learning, meta-learning, episodic training — are not wrong. They're just unnecessary when you have a model that already understands the world.
This is not a prediction. It has already happened. The top few-shot learning benchmarks are now dominated by foundation models using zero-shot or simple linear probes. The conferences still accept few-shot papers, but the leaderboards tell the real story.
A Brief, Fond Obituary
Few-shot learning was one of the most intellectually beautiful subfields in machine learning. The core question was profound: how do you learn from almost nothing? The answers were elegant. They just got overtaken by something bigger.
Learn a similarity function between image pairs. Requires careful pair construction and task-specific training.
Attention over support set embeddings. Episodic training to simulate few-shot at train time.
Classify by distance to class prototypes in embedding space. Elegant, but the embedding is the whole game.
Learn an initialization that adapts in few gradient steps. Beautiful idea. Difficult to scale.
Harder benchmark: diverse domains, variable shots. Exposed fragility of existing methods.
Put examples in the prompt. No training. No gradients. Just scale. The inflection point.
Classify images by text description alone. Zero-shot beats many few-shot methods. Game over for image few-shot.
Universal visual features. Any downstream task with minimal adaptation. Few-shot is just "use good features."
Describe the task in natural language. Show a couple examples. Done. Few-shot learning is now a prompt.
What Happened
Two papers broke the field. Not by refuting few-shot learning, but by making it trivially solvable as a side effect of scale.
GPT-3: Few-Shot as Prompting
Brown et al. showed that a 175B parameter language model could perform few-shot classification by simply placing examples in the context window. No meta-learning. No episodic training. No learned distance functions. Just next-token prediction at sufficient scale.
CLIP: Zero-Shot as Default
Radford et al. trained on 400M image-text pairs and got a model that classifies images by text description alone. Zero-shot CLIP beat many fully-trained few-shot methods on their own benchmarks. The entire concept of "few-shot image classification" became a rounding error.
The pattern is unmistakable. Few-shot learning researchers spent years building increasingly sophisticated episodic training procedures to learn good representations for low-data scenarios. Foundation models achieved the same thing — better, actually — by training on vastly more data with simpler objectives. The specialized field was simply outscaled.
The Evidence
Numbers don't lie. Across every major few-shot benchmark, foundation models have caught up or surpassed dedicated methods — usually with zero or minimal adaptation.
| Benchmark | Specialized Method | Foundation Model | Year |
|---|---|---|---|
| miniImageNet 5-way 1-shot | MAML++ 75.1% | CLIP zero-shot 79.3% | 2021 |
| tieredImageNet 5-way 1-shot | ProtoNet + SSL 73.6% | DINOv2 + linear probe (1-shot) 82.1% | 2023 |
| Cross-domain few-shot (CUB) | Meta-Dataset CNAPS 73.2% | GPT-4V description + CLIP 78.4% | 2024 |
| NLP few-shot (SuperGLUE avg) | Pattern-exploiting Training 76.8% | GPT-3 few-shot prompting 79.5% | 2020 |
| Few-shot object detection (PASCAL VOC) | TFA w/ cos 39.8 mAP | Grounding DINO zero-shot 48.7 mAP | 2023 |
| Few-shot speech commands (5-way 1-shot) | Prototypical Networks 82.4% | Whisper embeddings + kNN 89.1% | 2024 |
The Real Insight
Few-shot learning was never really about few-shot learning.
It was about learning good representations. Every successful few-shot method — ProtoNets, Matching Networks, MAML — worked because it learned embeddings where similar things were close and different things were far apart. The episodic training, the meta-learning objectives, the support/query splits — these were all scaffolding to learn better features under data constraints.
Foundation models solve the representation problem directly. Train on enough data with self-supervision, and you get embeddings so good that nearest-neighbor classification in the resulting space beats any meta-learned metric. The scaffolding becomes unnecessary when the foundation is strong enough.
What's Left of Few-Shot Learning?
Intellectual honesty requires acknowledging the niches where dedicated few-shot methods still hold value. They exist, but the list is shorter than the field wants to admit.
When you literally cannot call an API. Edge devices, offline scenarios, privacy constraints.
Domains where foundation models have limited pretraining data. But this window is closing.
Real-world physical interaction data is scarce and expensive. Meta-learning still adds value here.
GPT-3 killed this in 2020. In-context learning is strictly better than any meta-learning approach.
CLIP and DINOv2 made this pointless. Zero-shot or linear probe on frozen features wins.
Multilingual LLMs handle this natively. No need for specialized few-shot transfer.
The pattern: Few-shot methods remain viable only in domains where foundation model pretraining data is scarce or where inference-time constraints prevent using large models. As foundation models expand to new modalities and edge deployment improves, even these niches will shrink.
The Pattern: Bitter Lesson, Chapter Two
This is not the first time a specialized AI subfield has been absorbed by general-purpose scale. It is the same story Rich Sutton identified in 2019, playing out again with remarkable fidelity.
The Bitter Lesson Pattern
- Researchers encode domain knowledge into specialized methods
- These methods work well on carefully constructed benchmarks
- The field grows: workshops, surveys, benchmarks, taxonomies
- A general method powered by scale casually matches the results
- The specialized field enters denial, then bargaining, then niche-seeking
- The general method improves further. The field quietly pivots.
Previous Victims
"The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin."
The Counterarguments (And Why They're Weak)
"Foundation models use few-shot learning internally"
This is technically true and entirely beside the point. In-context learning in LLMs is mechanistically related to meta-learning. But the implication is devastating for the field: the best few-shot learner is one that wasn't trained to do few-shot learning at all. It emerged from next-token prediction at scale. If your specialized training procedure produces worse few-shot performance than an emergent capability of a general model, your specialization adds negative value.
"Few-shot methods are more efficient"
Efficient at training, yes. But nobody cares about training efficiency for a model you train once and use forever. Foundation models amortize their training cost across billions of users and tasks. The per-task cost is vanishingly small. And at inference time, a CLIP classification is just a dot product — identical cost to ProtoNets.
"We need few-shot for novel domains"
This was true in 2020. In 2026, foundation models cover vision, language, audio, video, code, molecules, proteins, weather, and genomics. The "novel domain" argument keeps retreating to increasingly niche territory. At some point, you have to ask: are you solving a real problem, or defending a research agenda?
"The benchmarks aren't fair"
This is the most honest objection. Foundation models have seen far more data than few-shot methods are allowed to use — they've arguably seen the test distribution during pretraining. But that's exactly the point. If you can pretrain on diverse data and get few-shot for free, why would you do it the hard way? The benchmark comparison isn't about fairness. It's about practical relevance.
Implications for Researchers
Stop Doing
- --Publishing miniImageNet results as if they prove anything
- --Building meta-learning algorithms that don't compare against frozen foundation model baselines
- --Proposing new episodic training procedures for vision
- --Calling in-context learning "few-shot" to boost citation counts
Start Doing
- ++Studying in-context learning as a mechanistic phenomenon
- ++Working on efficient adaptation of foundation models (LoRA, adapters, prompt tuning)
- ++Focusing on domains where pretraining data genuinely doesn't exist
- ++Building on foundation model representations instead of competing with them
The Honest Question
If you are working on few-shot learning in 2026, ask yourself: am I solving this problem because it's genuinely unsolved, or because I have a hammer and this looks like a nail? The best few-shot learning researchers have already pivoted — Chelsea Finn works on robot foundation models, Oriol Vinyals led Gemini at DeepMind. They read the writing on the wall. The meta-learners meta-learned a new career direction.
The Deeper Lesson
Few-shot learning's absorption into foundation models isn't a failure of the researchers. It's a success of the representations. The field asked "how do we learn from little data?" and the answer turned out to be: "pretrain on a lot of data first, then everything is few-shot."
This is, in a way, a vindication of the few-shot learning intuition. The core insight — that good representations enable rapid adaptation — was exactly right. The mistake was thinking that few-shot-specific training was the best way to get those representations.
The field didn't fail. It succeeded so completely that it became a footnote in a larger story. The research on metric learning, prototype computation, and gradient-based meta-learning laid the intellectual foundations for understanding why in-context learning works. That contribution is real. But the practical methods? Those are done.
What Comes Next
More subfields will be absorbed
Domain adaptation, transfer learning, multi-task learning — each faces the same existential question. If a single foundation model handles all your tasks, what is the purpose of your specialized transfer method?
The research focus shifts to adaptation efficiency
The interesting question is no longer "how do we learn from few examples" but "how do we efficiently adapt a foundation model to a specific task." This is where LoRA, prefix tuning, and prompt engineering live. It's a better question.
The last niches will fall
On-device learning, molecular property prediction, robotic manipulation — each has its own foundation model coming. RT-2 for robotics. MolBERT and beyond for chemistry. When those arrive, the last justifications for dedicated few-shot methods evaporate.
Few-shot learning is dead.
Its ideas live on in every foundation model that adapts to new tasks from a handful of examples — or none at all. The best epitaph a research field can have is that its central problem was solved so thoroughly that nobody needs to think about it anymore.
The bitter lesson strikes again.
References
Related Guides
Not sure which solution fits your use case?
Describe your challenge and we'll point you to the right solution - or create a dedicated benchmark for your needs.