WinoGrande

Unknown

44K Winograd-style problems requiring commonsense reasoning to resolve pronoun references.

Benchmark Stats

Models3
Papers3
Metrics1

SOTA History

Not enough data to show trend.

Only 3 models on this benchmark

Help build the community leaderboard — submit your model results.

accuracy

accuracy

Higher is better

RankModelSourceScoreYearPaper
1gpt-4o

Pronoun resolution requiring commonsense reasoning.

Editorial87.52025Source
2claude-35-sonnetEditorial85.42025Source
3llama-3-70bEditorial85.32025Source

Submit a Result