More difficult successor to GLUE with 8 challenging tasks. Designed to be hard for current models.
Average Score is the reported evaluation metric for SuperGLUE. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | DeBERTa-v3-large | verified | 91.4 | 2021 | Paper ↗ | Looks wrong? |
| 02 | ST-MoE-32B | verified | 91.2 | 2022 | Paper ↗Source ↗ | Looks wrong? |
| 03 | GPT-4o | verified | 90.3 | 2023 | Paper ↗Source ↗ | Looks wrong? |
| 04 | Gemini Ultra | verified | 90 | 2023 | Paper ↗ | Looks wrong? |
| 05 | PaLM 2 (Large) | verified | 87.3 | 2023 | Paper ↗ | Looks wrong? |
| 06 | Llama 3.1 405B | verified | 86.7 | 2024 | Paper ↗ | Looks wrong? |
| 07 | Qwen2 72B | verified | 85.4 | 2024 | Paper ↗ | Looks wrong? |
Score is the reported evaluation metric for SuperGLUE. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | ByT5 XXL | unverified | 88.6 | 2021 | Paper ↗Code ↗ | Looks wrong? |