winogrande
Unknown
OCR benchmark
3
Total Results
3
Models Tested
1
Metrics
2025-12-19
Last Updated
accuracy
Higher is better
| Rank | Model | Score | Source |
|---|---|---|---|
| 1 | gpt-4o Pronoun resolution requiring commonsense reasoning. | 87.5 | openai-blog |
| 2 | claude-35-sonnet | 85.4 | anthropic-blog |
| 3 | llama-3-70b | 85.3 | meta-blog |