GLUE.

General Language Understanding Evaluation for masked language models

Submit a result ↵

§ 01 · Leaderboard

Best published scores.

3 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.

Primary: accuracy · higher is better

avg-score

3 rows

#	Model	Org	Submitted	Paper / code	avg-score
01	DeBERTa-v3-largeOSS	Microsoft	Jan 2023	DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Tra…	91.37
02	ALBERT-xxlarge-v2OSS	Google	Feb 2020	ALBERT: A Lite BERT for Self-supervised Learning of Lang…	89.40
03	RoBERTa-largeOSS	Facebook AI	Jul 2019	RoBERTa: A Robustly Optimized BERT Pretraining Approach	88.50

Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.

§ 04 · Literature

3 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Jan 2023·DeBERTa-v3-large
arXiv ↗
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Feb 2020·ALBERT-xxlarge-v2
arXiv ↗
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Jul 2019·RoBERTa-large
arXiv ↗

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

GLUE.

Best published scores.

3 paperstied to this benchmark.

Have a score that beatsthis table?

3 papers
tied to this benchmark.

Have a score that beats
this table?