Natural Language Processingzero-shot-classification

Zero-Shot Classification

Zero-shot classification asks a model to categorize text into labels it has never been explicitly trained on — the ultimate test of language understanding and generalization. The breakthrough was the natural language inference (NLI) trick: reframe classification as "does this text entail the label?" using models fine-tuned on MNLI, pioneered by Yin et al. (2019) and popularized by BART-large-MNLI. Today, instruction-tuned LLMs have largely subsumed this approach — GPT-4, Claude, and Llama 3 can classify into arbitrary taxonomies via prompting with near-supervised accuracy. The remaining challenge is consistency and calibration: LLMs are powerful but their predictions can be brittle to prompt phrasing, making them unreliable for high-stakes automated pipelines without careful engineering.

1
Datasets
3
Results
accuracy
Canonical metric
Canonical Benchmark

XNLI

Cross-lingual natural language inference across 15 languages

Primary metric: accuracy
View full leaderboard

Top 10

Leading models on XNLI.

RankModelaccuracyYearSource
1
GPT-4
87.42023paper
2
XLM-RoBERTa-large
83.62019paper
3
mDeBERTa-v3-base
80.82022paper

All datasets

1 dataset tracked for this task.

Related tasks

Other tasks in Natural Language Processing.

Run Inference

Looking to run a model? HuggingFace hosts inference for this task type.

HuggingFace