Text Classification2019en

SuperGLUE

More difficult successor to GLUE with 8 challenging tasks. Designed to be hard for current models.

Metrics:average-score
Paper / WebsiteDownload
Current State of the Art

DeBERTa-v3-large

Microsoft

91.4

average-score

SuperGLUE — average-score

7 results · 1 SOTA advances · higher is better

All results
SOTA frontier
84858687888990919220212022202320242025average-scoreDeBERTa-v3-large

Top Models Performance Comparison

Top 7 models ranked by average-score

average-score1DeBERTa-v3-large91.4100.0%2ST-MoE-32B91.299.8%3GPT-4o90.398.8%4Gemini Ultra90.098.5%5PaLM 2 (Large)87.395.5%6Llama 3.1 405B86.794.9%7Qwen2 72B85.493.4%0%25%50%75%100%% of best
Best Score
91.4
Top Model
DeBERTa-v3-large
Models Compared
7
Score Range
6.0

average-scorePrimary

#ModelScorePaper / CodeDate
1
DeBERTa-v3-largeOpen Source
Microsoft
91.4Nov 2021
2
ST-MoE-32BOpen Source
Google
91.2Feb 2022
3
GPT-4oAPI
OpenAI
90.3Mar 2023
4
Gemini Ultra
Google DeepMind
90Dec 2023
5
PaLM 2 (Large)
Google
87.3May 2023
6
Llama 3.1 405BOpen Source
Meta
86.7Jul 2024
7
Qwen2 72B
Alibaba
85.4Jul 2024

Related Papers7

Other Text Classification Datasets