All Verified Results
349 benchmark results across 51 datasets. Every data point links to its source.
349
Total Results
51
Benchmarks
156
Models
JSON API: Download raw data at /data/benchmarks.json
Complete Results Table
Pending Verification
These results are claimed in papers but need manual verification from the source PDF.
| Model | Dataset | Claimed Value | Status |
|---|---|---|---|
| trocr-large | sroie | 96.58 | needs-pdf-verification |
| trocr-large | iam | 2.89 | needs-pdf-verification |
| paddleocr-v4 | icdar-2015 | Unknown | needs-documentation-verification |
| polish-roberta-ocr | poleval-2021-ocr | Unknown | |
| polish-t5-ocr | poleval-2021-ocr | Unknown | |
| herbert | poleval-2021-ocr | Unknown | |
| abbyy-finereader | impact-psnc | Unknown | |
| tesseract-polish | impact-psnc | Unknown | |
| abbyy-finereader | impact-psnc | Unknown | |
| tesseract-polish | impact-psnc | Unknown | |
| tesseract-polish | codesota-polish | Unknown | |
| tesseract-polish | codesota-polish | Unknown | |
| tesseract-polish | codesota-polish | Unknown | |
| tesseract-polish | codesota-polish-wikipedia | Unknown | |
| tesseract-polish | codesota-polish-real | Unknown | |
| tesseract-polish | codesota-polish-synth-random | Unknown | |
| tesseract-polish | codesota-polish-synth-words | Unknown | |
| claude-sonnet-4 | swe-bench-verified | Unknown | |
| claude-sonnet-4-high-compute | swe-bench-verified | Unknown | |
| claude-opus-4.5 | swe-bench-verified | Unknown | |
| o3 | swe-bench-verified | Unknown | |
| claude-3.7-sonnet | swe-bench-verified | Unknown | |
| claude-3.5-sonnet | swe-bench-verified | Unknown | |
| o1 | swe-bench-verified | Unknown | |
| gpt-4o | swe-bench-verified | Unknown | |
| o3 | aime-2024 | Unknown | |
| o1 | aime-2024 | Unknown | |
| deepseek-r1 | aime-2024 | Unknown | |
| o1 | aime-2024 | Unknown | |
| gpt-4o | aime-2024 | Unknown | |
| o3 | gpqa-diamond | Unknown | |
| gemini-2.5-pro | gpqa-diamond | Unknown | |
| o1 | gpqa-diamond | Unknown | |
| o3-mini | gpqa-diamond | Unknown | |
| claude-3.5-sonnet | gpqa-diamond | Unknown | |
| gpt-4o | gpqa-diamond | Unknown |
Data Quality
All benchmark results are sourced from AlphaXiv benchmark leaderboards. Each data point includes the source URL for verification.
Results marked as "pending verification" are claimed in papers but have not been independently confirmed. We do not include estimated or interpolated values.