Codesota · Benchmark · GLUEHome/Leaderboards/GLUE
Unknown

GLUE.

General Language Understanding Evaluation for masked language models

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Only 3 models on this benchmark
Help build the community leaderboard — submit your model results.
Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Avg Score

Avg Score is the reported evaluation metric for GLUE. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Avg Scoreverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01DeBERTa-v3-large
DeBERTa-v3-large GLUE test average. From DeBERTaV3 paper Table 2 and GLUE leaderboard.
verified91.372023Source ↗Looks wrong?
02ALBERT-xxlarge-v2
ALBERT-xxlarge-v2 GLUE test average (single model, not ensemble). From ALBERT paper Table 1.
verified89.42020Source ↗Looks wrong?
03RoBERTa-large
RoBERTa-large GLUE test average (single model). From original RoBERTa paper and GLUE leaderboard.
verified88.52019Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards