Home/OCR/Benchmarks/hellaswag

hellaswag

Unknown

OCR benchmark

4
Total Results
4
Models Tested
1
Metrics
2025-12-21
Last Updated

accuracy

Higher is better

RankModelScoreSource
1gpt-4o

Commonsense NLI. Models now exceed human performance (95.6%).

95.3openai-blog
2gemini-15-pro92.5google-blog
3claude-35-sonnet89anthropic-blog
4llama-3-70b88meta-blog

Explore More OCR Content