Codesota · Benchmark · SuperGLUEHome/Leaderboards/Language & Knowledge/Text Classification/SuperGLUE
Unknown

SuperGLUE.

More difficult successor to GLUE with 8 challenging tasks. Designed to be hard for current models.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

average-score

Average Score is the reported evaluation metric for SuperGLUE. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for average-scoreverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01DeBERTa-v3-large
DeBERTa-v3-Large fine-tuned. Source: Table 3, arxiv:2111.09543.
verified91.42021Paper ↗Looks wrong?
02ST-MoE-32B
ST-MoE-32B fine-tuned. SuperGLUE leaderboard SOTA. Source: arxiv:2202.08906.
verified91.22022Paper ↗Source ↗Looks wrong?
03GPT-4o
GPT-4o few-shot.
verified90.32023Paper ↗Source ↗Looks wrong?
04Gemini Ultra
Gemini Ultra. Source: Gemini technical report Table 6.
verified902023Paper ↗Looks wrong?
05PaLM 2 (Large)
PaLM 2 Large. Source: PaLM 2 technical report.
verified87.32023Paper ↗Looks wrong?
06Llama 3.1 405B
Llama 3.1 405B Instruct. Source: Llama 3 paper.
verified86.72024Paper ↗Looks wrong?
07Qwen2 72B
Qwen2 72B Instruct. Source: Qwen2 technical report.
verified85.42024Paper ↗Looks wrong?

Score

Score is the reported evaluation metric for SuperGLUE. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Scoreverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01ByT5 XXLunverified88.62021Paper ↗Code ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Text Classification