WinoGrande

Unknown

44K Winograd-style problems requiring commonsense reasoning to resolve pronoun references.

Benchmark Stats

Models3
Papers3
Metrics1

SOTA History

Coming Soon
Visual timeline of state-of-the-art progression over time will appear here.

accuracy

accuracy

Higher is better

RankModelCodeScorePaper / Source
1gpt-4o

Pronoun resolution requiring commonsense reasoning.

-87.5openai-blog
2claude-35-sonnet-85.4anthropic-blog
3llama-3-70bHF85.3meta-blog