Natural Language Processing

Processing and understanding text? Evaluate your models on language understanding, generation, translation, and information extraction benchmarks.

9 tasks 6 datasets 0 results

Language Modeling

Predicting the next word or token in a sequence. Core task for GPT-style models.

0 datasets 0 results
No datasets indexed yet. Contribute on GitHub

Machine Translation

Translating text from one language to another (WMT benchmarks).

0 datasets 0 results
No datasets indexed yet. Contribute on GitHub

Question Answering

Answering questions based on context (SQuAD, Natural Questions).

1 datasets 0 results
SQuAD v2.0 Stanford Question Answering Dataset v2.0 2018

150K questions on Wikipedia articles, including 50K unanswerable questions. Tests reading comprehension and knowing when a question cannot be answered.

Text Classification

Categorizing text into predefined classes (sentiment, topic).

2 datasets 0 results
GLUE General Language Understanding Evaluation 2018

Collection of 9 NLU tasks including sentiment analysis, textual entailment, and question answering. Standard benchmark for general language understanding.

SuperGLUE SuperGLUE 2019

More difficult successor to GLUE with 8 challenging tasks. Designed to be hard for current models.

Named Entity Recognition

Identifying and classifying named entities in text (CoNLL).

1 datasets 0 results
CoNLL-2003 CoNLL-2003 Named Entity Recognition 2003

Reuters news stories annotated with 4 entity types: PER, ORG, LOC, MISC. The standard NER benchmark.

Text Summarization

Generating concise summaries of longer documents (CNN/DailyMail, XSum).

1 datasets 0 results
CNN/DailyMail CNN/DailyMail Summarization 2015

300K news articles with multi-sentence summaries. Standard benchmark for abstractive summarization.

Natural Language Inference

Determining entailment relationships between sentences (SNLI, MNLI).

1 datasets 0 results
SNLI Stanford Natural Language Inference 2015

570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral.

Semantic Textual Similarity

Measuring similarity between text pairs (STS Benchmark).

0 datasets 0 results
No datasets indexed yet. Contribute on GitHub

Reading Comprehension

Understanding and answering questions about passages.

0 datasets 0 results
No datasets indexed yet. Contribute on GitHub