Who leads the WinoGrande benchmark?

gpt-4o currently leads WinoGrande with a score of 87.5 on accuracy.

What is the state-of-the-art score on WinoGrande?

The state-of-the-art result on WinoGrande is 87.5 (accuracy), achieved by gpt-4o as of 2026.

How many models are tracked on WinoGrande?

Codesota tracks 15 models on WinoGrande.

When was the WinoGrande leaderboard last updated?

The WinoGrande leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2023.

Codesota · Benchmark · WinoGrandeHome/Leaderboards/WinoGrande

Unknown

WinoGrande.

Name: WinoGrande Benchmark Results
Creator: Unknown
Published: 2023-01-01
License: https://creativecommons.org/licenses/by/4.0/

44K Winograd-style problems requiring commonsense reasoning to resolve pronoun references.

Paper ↗Leaderboard ↓

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

accuracy

Accuracy is the reported evaluation metric for WinoGrande. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	gpt-4o Pronoun resolution requiring commonsense reasoning.	paper	87.5	2025	Source ↗	Looks wrong?
02	Claude 3.5 Sonnet	unverified	85.4	2025	Source ↗	Looks wrong?
03	claude-35-sonnet	paper	85.4	2025	Source ↗	Looks wrong?
04	llama-3-70b	paper	85.3	2025	Source ↗	Looks wrong?
05	Llama 3 70B	unverified	85.3	2025	Source ↗	Looks wrong?
06	Trinity Large Base (5-shot)	unverified	80.82	2026	Paper ↗Code ↗	Looks wrong?
07	Step-3.5-Flash Base	unverified	79.1	2026	Paper ↗Code ↗	Looks wrong?
08	Chameleon 34B	unverified	78.5	2024	Paper ↗Code ↗	Looks wrong?
09	LLaMA-65B	unverified	77	2023	Paper ↗Code ↗	Looks wrong?
10	Apertus-70B	unverified	73.3	2025	Paper ↗Code ↗	Looks wrong?
11	HRM-Text-1B	unverified	72.4	2026	Paper ↗Code ↗	Looks wrong?
12	BitNet b1.58 2B4T	unverified	71.9	2025	Paper ↗Code ↗	Looks wrong?
13	Helium	unverified	70	2024	Paper ↗Code ↗	Looks wrong?
14	SmoLM2 (1.7B)	unverified	59.4	2025	Paper ↗Code ↗	Looks wrong?
15	OLMo-2-7B-1124 (olmOCR-peS2o)	unverified	58	2025	Paper ↗Code ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards