Text Classification2019en
SuperGLUE
More difficult successor to GLUE with 8 challenging tasks. Designed to be hard for current models.
Current State of the Art
DeBERTa-v3-large
Microsoft
91.4
average-score
SuperGLUE — average-score
7 results · 1 SOTA advances · higher is better
All results
SOTA frontier
Top Models Performance Comparison
Top 7 models ranked by average-score
Best Score
91.4
Top Model
DeBERTa-v3-large
Models Compared
7
Score Range
6.0
average-scorePrimary
| # | Model | Score | Paper / Code | Date |
|---|---|---|---|---|
| 1 | DeBERTa-v3-largeOpen Source Microsoft | 91.4 | Nov 2021 | |
| 2 | ST-MoE-32BOpen Source Google | 91.2 | Feb 2022 | |
| 3 | GPT-4oAPI OpenAI | 90.3 | Mar 2023 | |
| 4 | Gemini Ultra Google DeepMind | 90 | Dec 2023 | |
| 5 | PaLM 2 (Large) Google | 87.3 | May 2023 | |
| 6 | Llama 3.1 405BOpen Source Meta | 86.7 | Jul 2024 | |
| 7 | Qwen2 72B Alibaba | 85.4 | Jul 2024 |
Related Papers7
The Llama 3 Herd of Models
Jul 2024Models: Llama 3.1 405B
Qwen2 Technical Report
Jul 2024Models: Qwen2 72B
Gemini: A Family of Highly Capable Multimodal Models
Dec 2023Models: Gemini Ultra
PaLM 2 Technical Report
May 2023Models: PaLM 2 (Large)
GPT-4 Technical Report
Mar 2023Models: GPT-4o
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Feb 2022Models: ST-MoE-32B
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Nov 2021Models: DeBERTa-v3-large