Codesota · OCR · ResultsEvery scored run, linked to sourceUpdated 2026-04-20
§ 00 · Register
Every OCR score, traceable.
1008 benchmark results across 112 datasets and 598 distinct models. Every row links to its original source.
Raw data is public at /data/benchmarks.json. Nothing is interpolated; pending claims are listed separately below.
§ 01 · Full register
Complete results, in order of submission.
| Model | Dataset | Metric | Value | Source |
|---|---|---|---|---|
| Peng et al. 2023 (WRN-70-16) | robustbench-cifar10-linf | Robust Accuracy | 71.07 | codesota-api |
| Wang et al. 2023 (WRN-70-16) | robustbench-cifar10-linf | Robust Accuracy | 70.69 | codesota-api |
| Gowal et al. 2021 (WRN-70-16) | robustbench-cifar10-linf | Robust Accuracy | 66.11 | codesota-api |
| Grounding DINO 1.5 Pro | lvis-zero-shot | ap | 47.6 | codesota-api |
| OWLv2 (ViT-L) | lvis-zero-shot | ap | 44.6 | codesota-api |
| YOLO-World v2-X | lvis-zero-shot | ap | 35.4 | codesota-api |
| CodeLlama-34B | apps | pass@5 | 32.81 | codesota-api |
| CodeLlama-13B | apps | pass@5 | 23.74 | codesota-api |
| CodeLlama-7B | apps | pass@5 | 10.76 | codesota-api |
| Ultravox-GLM-4P7 | voicebench | overall-score | 88.86 | codesota-api |
| Whisper-v3-large + GPT-4o (cascade) | voicebench | overall-score | 87.8 | codesota-api |
| GPT-4o-Audio | voicebench | overall-score | 86.75 | codesota-api |
| Whisper-v3-large + LLaMA-3.1-8B (cascade) | voicebench | overall-score | 77.48 | codesota-api |
| Kimi-Audio | voicebench | overall-score | 76.91 | codesota-api |
| MiniCPM-o | voicebench | overall-score | 71.23 | codesota-api |
| VITA-1.5 | voicebench | overall-score | 64.53 | codesota-api |
| Qwen2-Audio | voicebench | overall-score | 55.8 | codesota-api |
| LLaMA-Omni | voicebench | overall-score | 41.12 | codesota-api |
| VITA-1.0 | voicebench | overall-score | 36.43 | codesota-api |
| Mini-Omni2 | voicebench | overall-score | 33.49 | codesota-api |
| Mini-Omni | voicebench | overall-score | 30.42 | codesota-api |
| Moshi | voicebench | overall-score | 29.51 | codesota-api |
| Qwen2.5-VL 72B | textvqa | accuracy | 85.5 | codesota-api |
| Qwen2-VL 72B | textvqa | accuracy | 84.9 | codesota-api |
| InternVL2-76B | textvqa | accuracy | 84.4 | codesota-api |
| Llama 3.2 Vision 90B | textvqa | accuracy | 83.4 | codesota-api |
| Gemini 1.5 Pro | textvqa | accuracy | 82.2 | codesota-api |
| GPT-4V | textvqa | accuracy | 78 | codesota-api |
| GPT-4o | textvqa | accuracy | 77.4 | codesota-api |
| LLaVA-1.5 | textvqa | accuracy | 61.3 | codesota-api |
| BLIP-2 | textvqa | accuracy | 42.5 | codesota-api |
| Qwen2.5-VL-72B | ocrbench-v2 | overall-zh-private | 63.7 | codesota-api |
| seed-1.6-vision | ocrbench-v2 | overall-en-private | 62.2 | codesota-api |
| gemini-25-pro | ocrbench-v2 | overall-zh-private | 62.2 | codesota-api |
| Qwen2.5-VL-72B | ocrbench-v2 | overall-en-private | 61.5 | codesota-api |
| qwen3-omni-30b | ocrbench-v2 | overall-en-private | 61.3 | codesota-api |
| nemotron-nano-v2-vl | ocrbench-v2 | overall-en-private | 61.2 | codesota-api |
| Qianfan-OCR | ocrbench-v2 | overall-zh-private | 60.77 | codesota-api |
| gemini-25-pro | ocrbench-v2 | overall-en-private | 59.3 | codesota-api |
| minicpm-v-4.5-8b | ocrbench-v2 | overall-zh-private | 58.8 | codesota-api |
| sail-vl2-8b | ocrbench-v2 | overall-zh-private | 57.6 | codesota-api |
| llama-3.1-nemotron-nano-vl-8b | ocrbench-v2 | overall-en-private | 56.4 | codesota-api |
| Qianfan-OCR | ocrbench-v2 | overall-en-private | 56 | codesota-api |
| InternVL3-14B | ocrbench-v2 | overall-zh-public | 55.7 | codesota-api |
| Qwen2.5-VL-7B | ocrbench-v2 | overall-zh-public | 55.6 | codesota-api |
| gpt-4o | ocrbench-v2 | overall-en-private | 55.5 | codesota-api |
| ovis2.5-8b | ocrbench-v2 | overall-en-private | 54.1 | codesota-api |
| InternVL3-14B | ocrbench-v2 | overall-en-public | 52.6 | codesota-api |
| Gemini 1.5 Pro | ocrbench-v2 | overall-en-public | 51.9 | codesota-api |
| gemini-1.5-pro | ocrbench-v2 | overall-en-private | 51.6 | codesota-api |
| sail-vl2-8b | ocrbench-v2 | overall-en-private | 49.3 | codesota-api |
| Ovis2-8B | ocrbench-v2 | overall-zh-public | 49.2 | codesota-api |
| claude-3.5-sonnet | ocrbench-v2 | overall-zh-private | 48.4 | codesota-api |
| minicpm-v-4.5-8b | ocrbench-v2 | overall-en-private | 48.4 | codesota-api |
| Qwen2-VL-72B | ocrbench-v2 | overall-en-private | 47.8 | codesota-api |
| Ovis2-8B | ocrbench-v2 | overall-en-public | 47.7 | codesota-api |
| gpt-4o-2024 | ocrbench-v2 | overall-en-private | 47.6 | codesota-api |
| claude-3.5-sonnet | ocrbench-v2 | overall-en-private | 47.5 | codesota-api |
| internvl3.5-14b | ocrbench-v2 | overall-en-private | 47.1 | codesota-api |
| step-1v | ocrbench-v2 | overall-en-private | 46.8 | codesota-api |
| Qwen2.5-VL-7B | ocrbench-v2 | overall-en-public | 46.7 | codesota-api |
| Step-1V | ocrbench-v2 | overall-en-public | 46.7 | codesota-api |
| GPT-4o | ocrbench-v2 | overall-en-public | 46.5 | codesota-api |
| InternVL2.5-78B | ocrbench-v2 | overall-zh-private | 46.2 | codesota-api |
| Qwen2-VL-72B | ocrbench-v2 | overall-zh-private | 46.1 | codesota-api |
| gpt-4o-2024 | ocrbench-v2 | overall-zh-private | 45.7 | codesota-api |
| Claude 3.5 Sonnet | ocrbench-v2 | overall-en-public | 45.2 | codesota-api |
| MiniCPM-o-2.6 | ocrbench-v2 | overall-en-public | 45.1 | codesota-api |
| InternVL2.5-78B | ocrbench-v2 | overall-en-private | 45 | codesota-api |
| grok4 | ocrbench-v2 | overall-en-private | 45 | codesota-api |
| gpt-4o-mini | ocrbench-v2 | overall-en-private | 44.1 | codesota-api |
| DeepSeek-VL2-Small | ocrbench-v2 | overall-en-public | 43.3 | codesota-api |
| Gemini 1.5 Pro | ocrbench-v2 | overall-zh-public | 43.1 | codesota-api |
| DeepSeek-VL2-Small | ocrbench-v2 | overall-zh-public | 42.7 | codesota-api |
| GLM-4V-9B | ocrbench-v2 | overall-en-public | 42.6 | codesota-api |
| Step-1V | ocrbench-v2 | overall-zh-public | 42.6 | codesota-api |
| claude-sonnet-4 | ocrbench-v2 | overall-en-private | 42.4 | codesota-api |
| qwen2.5-vl-7b | ocrbench-v2 | overall-en-private | 41.8 | codesota-api |
| MiniCPM-o-2.6 | ocrbench-v2 | overall-zh-public | 41.1 | codesota-api |
| deepseek-vl2-small | ocrbench-v2 | overall-en-private | 41 | codesota-api |
| Pixtral-12B | ocrbench-v2 | overall-en-public | 40.3 | codesota-api |
| Claude 3.5 Sonnet | ocrbench-v2 | overall-zh-public | 39.6 | codesota-api |
| pixtral-12b | ocrbench-v2 | overall-en-private | 38.4 | codesota-api |
| phi-4-multimodal | ocrbench-v2 | overall-en-private | 38.1 | codesota-api |
| glm-4v-9b | ocrbench-v2 | overall-en-private | 37.1 | codesota-api |
| GLM-4V-9B | ocrbench-v2 | overall-zh-public | 36.6 | codesota-api |
| LLaVA-OneVision-7B | ocrbench-v2 | overall-en-public | 36.4 | codesota-api |
| Cambrian-1-8B | ocrbench-v2 | overall-en-public | 34.7 | codesota-api |
| Molmo-7B | ocrbench-v2 | overall-en-public | 34.5 | codesota-api |
| molmo-7b | ocrbench-v2 | overall-en-private | 33.9 | codesota-api |
| llava-ov-7b | ocrbench-v2 | overall-en-private | 33.7 | codesota-api |
| GPT-4o | ocrbench-v2 | overall-zh-public | 32.2 | codesota-api |
| LLaVA-NeXT-8B | ocrbench-v2 | overall-en-public | 31.5 | codesota-api |
| idefics3-8b | ocrbench-v2 | overall-en-private | 26 | codesota-api |
| mistral-ocr-2512 | ocrbench-v2 | overall-en-private | 25.2 | codesota-api |
| TextMonkey | ocrbench-v2 | overall-en-public | 23.9 | codesota-api |
| docowl2 | ocrbench-v2 | overall-en-private | 23.4 | codesota-api |
| Monkey | ocrbench-v2 | overall-en-public | 23.1 | codesota-api |
| LLaVA-OneVision-7B | ocrbench-v2 | overall-zh-public | 17.8 | codesota-api |
| TextMonkey | ocrbench-v2 | overall-zh-public | 15.8 | codesota-api |
| Pixtral-12B | ocrbench-v2 | overall-zh-public | 14.6 | codesota-api |
| Monkey | ocrbench-v2 | overall-zh-public | 13.1 | codesota-api |
| Molmo-7B | ocrbench-v2 | overall-zh-public | 12.8 | codesota-api |
| Cambrian-1-8B | ocrbench-v2 | overall-zh-public | 9.9 | codesota-api |
| LLaVA-NeXT-8B | ocrbench-v2 | overall-zh-public | 9.1 | codesota-api |
| NV-Embed-v2 | beir | ndcg@10 | 62.65 | codesota-api |
| GTE-Qwen2-7B-instruct | beir | ndcg@10 | 60.25 | codesota-api |
| E5-Mistral-7B-instruct | beir | ndcg@10 | 56.9 | codesota-api |
| ColBERTv2 | beir | ndcg@10 | 49.4 | codesota-api |
| RankLLaMA-7B | ms-marco | mrr@10 | 41.8 | codesota-api |
| jina-reranker-v2-base-multilingual | ms-marco | mrr@10 | 41.2 | codesota-api |
| ColBERTv2 | ms-marco | mrr@10 | 39.7 | codesota-api |
| MonoT5-3B | ms-marco | mrr@10 | 39 | codesota-api |
| NV-Embed-v2 | mteb | avg-score | 72.31 | codesota-api |
| GTE-Qwen2-7B-instruct | mteb | avg-score | 72.05 | codesota-api |
| voyage-3-large | mteb | avg-score | 70.32 | codesota-api |
| E5-Mistral-7B-instruct | mteb | avg-score | 66.63 | codesota-api |
| jina-embeddings-v3 | mteb | avg-score | 65.18 | codesota-api |
| text-embedding-3-large | mteb | avg-score | 64.6 | codesota-api |
| GTE-Qwen2-7B-instruct | sts-benchmark | spearman | 88.4 | codesota-api |
| E5-Mistral-7B-instruct | sts-benchmark | spearman | 84.7 | codesota-api |
| all-MiniLM-L6-v2 | sts-benchmark | spearman | 82.8 | codesota-api |
| bestfitting (1st place ensemble) | severstal-steel-defect | Dice | 0.90883 | codesota-api |
| 2nd Place Solution | severstal-steel-defect | Dice | 0.9084 | codesota-api |
| U-Net Ensemble (Pavlov) | severstal-steel-defect | Dice | 0.903 | codesota-api |
| Kling 1.0 | vbench | total-score | 85.37 | codesota-api |
| Runway Gen-3 Alpha | vbench | total-score | 85.22 | codesota-api |
| CogVideoX-5B | vbench | total-score | 82.75 | codesota-api |
| Open-Sora 1.2 | vbench | total-score | 80.91 | codesota-api |
| BiGTex | ogb | accuracy-ogbn-products | 90.29 | codesota-api |
| BiGTex | ogb | accuracy-ogbn-arxiv | 88.51 | codesota-api |
| GLEM+GIANT+SAGN+SCR | ogb | accuracy-ogbn-products | 87.37 | codesota-api |
| LD+GIANT+SAGN+SCR | ogb | accuracy-ogbn-products | 87.18 | codesota-api |
| GraDBERT & RevGAT+KD | ogb | accuracy-ogbn-products | 86.92 | codesota-api |
| GraphSAGE | ogb | accuracy-ogbn-products | 83.89 | codesota-api |
| GCN | ogb | accuracy-ogbn-products | 82.33 | codesota-api |
| GAT | ogb | accuracy-ogbn-products | 80.99 | codesota-api |
| SimTeG+TAPE+RevGAT | ogb | accuracy-ogbn-arxiv | 78.03 | codesota-api |
| TAPE+RevGAT | ogb | accuracy-ogbn-arxiv | 77.5 | codesota-api |
| SimTeG+TAPE+GraphSAGE | ogb | accuracy-ogbn-arxiv | 77.48 | codesota-api |
| LD+REVGAT | ogb | accuracy-ogbn-arxiv | 77.26 | codesota-api |
| GraDBERT & RevGAT+KD | ogb | accuracy-ogbn-arxiv | 77.21 | codesota-api |
| GLEM+RevGAT | ogb | accuracy-ogbn-arxiv | 76.94 | codesota-api |
| GCN | ogb | accuracy-ogbn-arxiv | 73.6 | codesota-api |
| GAT | ogb | accuracy-ogbn-arxiv | 73.3 | codesota-api |
| GraphSAGE | ogb | accuracy-ogbn-arxiv | 72.95 | codesota-api |
| Cheetah (Vicuna-13B) | demon-bench | multi-image-reasoning | 53.65 | codesota-api |
| Cheetah (Vicuna-13B) | demon-bench | grounded-qa | 52.93 | codesota-api |
| Cheetah (LLaMA2-7B) | demon-bench | grounded-qa | 51 | codesota-api |
| Cheetah (Vicuna-7B) | demon-bench | multi-image-reasoning | 50.28 | codesota-api |
| Cheetah (Vicuna-13B) | demon-bench | knowledge-images-qa | 49.33 | codesota-api |
| Cheetah (LLaMA2-7B) | demon-bench | multi-image-reasoning | 48.68 | codesota-api |
| Cheetah (Vicuna-7B) | demon-bench | grounded-qa | 48.6 | codesota-api |
| InstructBLIP | demon-bench | multi-image-reasoning | 48.55 | codesota-api |
| InstructBLIP | demon-bench | grounded-qa | 47.4 | codesota-api |
| Cheetah (Vicuna-7B) | demon-bench | knowledge-images-qa | 44.93 | codesota-api |
| Cheetah (LLaMA2-7B) | demon-bench | knowledge-images-qa | 44.93 | codesota-api |
| LLaMA-Adapter V2 | demon-bench | grounded-qa | 44.8 | codesota-api |
| InstructBLIP | demon-bench | knowledge-images-qa | 44.4 | codesota-api |
| LLaMA-Adapter V2 | demon-bench | multi-image-reasoning | 44.03 | codesota-api |
| Otter | demon-bench | multi-image-reasoning | 43.85 | codesota-api |
| MiniGPT-4 | demon-bench | multi-image-reasoning | 43.5 | codesota-api |
| Cheetah (LLaMA2-7B) | demon-bench | multimodal-dialogue | 42.7 | codesota-api |
| mPLUG-Owl | demon-bench | multi-image-reasoning | 42.5 | codesota-api |
| Otter | demon-bench | grounded-qa | 41.67 | codesota-api |
| OpenFlamingo | demon-bench | multi-image-reasoning | 41.63 | codesota-api |
| LLaVA | demon-bench | multi-image-reasoning | 41.53 | codesota-api |
| BLIP-2 | demon-bench | multi-image-reasoning | 39.65 | codesota-api |
| Cheetah (Vicuna-13B) | demon-bench | accuracy | 39.28 | codesota-api |
| BLIP-2 | demon-bench | grounded-qa | 39.23 | codesota-api |
| Cheetah (Vicuna-13B) | demon-bench | multimodal-dialogue | 38.14 | codesota-api |
| Cheetah (Vicuna-7B) | demon-bench | multimodal-dialogue | 37.5 | codesota-api |
| Cheetah (LLaMA2-7B) | demon-bench | accuracy | 37.22 | codesota-api |
| Cheetah (Vicuna-7B) | demon-bench | accuracy | 36.37 | codesota-api |
| LLaVA | demon-bench | grounded-qa | 36.2 | codesota-api |
| InstructBLIP | demon-bench | multimodal-dialogue | 33.58 | codesota-api |
| BLIP-2 | demon-bench | knowledge-images-qa | 33.53 | codesota-api |
| mPLUG-Owl | demon-bench | grounded-qa | 33.27 | codesota-api |
| InstructBLIP | demon-bench | accuracy | 33 | codesota-api |
| mPLUG-Owl | demon-bench | knowledge-images-qa | 32.47 | codesota-api |
| OpenFlamingo | demon-bench | grounded-qa | 32 | codesota-api |
| LLaMA-Adapter V2 | demon-bench | knowledge-images-qa | 32 | codesota-api |
| OpenFlamingo | demon-bench | knowledge-images-qa | 30.6 | codesota-api |
| MiniGPT-4 | demon-bench | grounded-qa | 30.27 | codesota-api |
| LLaVA | demon-bench | knowledge-images-qa | 28.33 | codesota-api |
| Otter | demon-bench | knowledge-images-qa | 27.73 | codesota-api |
| Cheetah (Vicuna-13B) | demon-bench | visual-inference | 27.15 | codesota-api |
| Cheetah (Vicuna-13B) | demon-bench | relation-cloze | 27.15 | codesota-api |
| BLIP-2 | demon-bench | accuracy | 26.92 | codesota-api |
| Cheetah (Vicuna-13B) | demon-bench | storytelling | 26.59 | codesota-api |
| MiniGPT-4 | demon-bench | knowledge-images-qa | 26.4 | codesota-api |
| LLaMA-Adapter V2 | demon-bench | accuracy | 26.3 | codesota-api |
| BLIP-2 | demon-bench | multimodal-dialogue | 26.12 | codesota-api |
| Cheetah (Vicuna-7B) | demon-bench | visual-inference | 25.9 | codesota-api |
| OpenFlamingo | demon-bench | accuracy | 25.83 | codesota-api |
| Cheetah (LLaMA2-7B) | demon-bench | visual-inference | 25.5 | codesota-api |
| Cheetah (Vicuna-7B) | demon-bench | storytelling | 25.2 | codesota-api |
| Cheetah (LLaMA2-7B) | demon-bench | storytelling | 24.76 | codesota-api |
| Otter | demon-bench | accuracy | 24.51 | codesota-api |
| InstructBLIP | demon-bench | storytelling | 24.41 | codesota-api |
| OpenFlamingo | demon-bench | storytelling | 24.22 | codesota-api |
| mPLUG-Owl | demon-bench | accuracy | 23.13 | codesota-api |
| Cheetah (LLaMA2-7B) | demon-bench | relation-cloze | 22.95 | codesota-api |
| MiniGPT-4 | demon-bench | accuracy | 22.21 | codesota-api |
| Cheetah (Vicuna-7B) | demon-bench | relation-cloze | 22.15 | codesota-api |
| OpenFlamingo | demon-bench | relation-cloze | 21.65 | codesota-api |
| BLIP-2 | demon-bench | storytelling | 21.31 | codesota-api |
| LLaVA | demon-bench | accuracy | 21.24 | codesota-api |
| InstructBLIP | demon-bench | relation-cloze | 21.2 | codesota-api |
| mPLUG-Owl | demon-bench | storytelling | 19.33 | codesota-api |
| LLaMA-Adapter V2 | demon-bench | relation-cloze | 18 | codesota-api |
| BLIP-2 | demon-bench | relation-cloze | 17.94 | codesota-api |
| LLaMA-Adapter V2 | demon-bench | storytelling | 17.57 | codesota-api |
| MiniGPT-4 | demon-bench | storytelling | 17.07 | codesota-api |
| OpenFlamingo | demon-bench | multimodal-dialogue | 16.88 | codesota-api |
| MiniGPT-4 | demon-bench | relation-cloze | 16.6 | codesota-api |
| mPLUG-Owl | demon-bench | relation-cloze | 16.25 | codesota-api |
| Otter | demon-bench | relation-cloze | 16 | codesota-api |
| LLaVA | demon-bench | relation-cloze | 15.85 | codesota-api |
| Otter | demon-bench | storytelling | 15.57 | codesota-api |
| Otter | demon-bench | multimodal-dialogue | 15.37 | codesota-api |
| LLaMA-Adapter V2 | demon-bench | multimodal-dialogue | 14.22 | codesota-api |
| OpenFlamingo | demon-bench | visual-inference | 13.85 | codesota-api |
| MiniGPT-4 | demon-bench | multimodal-dialogue | 13.69 | codesota-api |
| LLaMA-Adapter V2 | demon-bench | visual-inference | 13.51 | codesota-api |
| mPLUG-Owl | demon-bench | multimodal-dialogue | 12.67 | codesota-api |
| InstructBLIP | demon-bench | visual-inference | 11.49 | codesota-api |
| Otter | demon-bench | visual-inference | 11.39 | codesota-api |
| LLaVA | demon-bench | storytelling | 10.7 | codesota-api |
| BLIP-2 | demon-bench | visual-inference | 10.67 | codesota-api |
| LLaVA | demon-bench | visual-inference | 8.27 | codesota-api |
| MiniGPT-4 | demon-bench | visual-inference | 7.95 | codesota-api |
| LLaVA | demon-bench | multimodal-dialogue | 7.79 | codesota-api |
| mPLUG-Owl | demon-bench | visual-inference | 5.4 | codesota-api |
| InterFuser | carla-leaderboard | driving_score | 76.18 | codesota-api |
| TCP | carla-leaderboard | driving_score | 75.14 | codesota-api |
| Think2Drive | carla-leaderboard | driving_score | 46 | codesota-api |
| mineru-2.5 | omnidocbench | layout-map | 97.5 | codesota-api |
| GLM-OCR | omnidocbench | composite | 94.62 | codesota-api |
| PaddleOCR-VL-1.5 | omnidocbench | composite | 94.5 | codesota-api |
| paddleocr-vl | omnidocbench | table-teds | 93.52 | codesota-api |
| Qianfan-OCR | omnidocbench | composite | 93.12 | codesota-api |
| paddleocr-vl | omnidocbench | composite | 92.86 | codesota-api |
| paddleocr-vl-0.9b | omnidocbench | composite | 92.56 | codesota-api |
| Qianfan-OCR | omnidocbench | formula-cdm | 92.43 | codesota-api |
| mistral-ocr-3 | omnidocbench | reading-order | 91.63 | codesota-api |
| Qianfan-OCR | omnidocbench | table-teds | 91.02 | codesota-api |
| mineru-2.5 | omnidocbench | composite | 90.67 | codesota-api |
| Gemini 3 Pro | omnidocbench | composite | 90.33 | codesota-api |
| Dolphin-v2 | omnidocbench | composite | 89.78 | codesota-api |
| qwen3-vl-235b | omnidocbench | composite | 89.15 | codesota-api |
| monkeyocr-pro-3b | omnidocbench | composite | 88.85 | codesota-api |
| ocrverse-4b | omnidocbench | composite | 88.56 | codesota-api |
| dots-ocr-3b | omnidocbench | composite | 88.41 | codesota-api |
| gemini-25-pro | omnidocbench | composite | 88.03 | codesota-api |
| MonkeyOCR-3B | omnidocbench | composite | 87.13 | codesota-api |
| qwen25-vl | omnidocbench | composite | 87.02 | codesota-api |
| MonkeyOCR-pro-1.2B | omnidocbench | composite | 86.96 | codesota-api |
| PP-StructureV3 | omnidocbench | composite | 86.73 | codesota-api |
| DeepSeek-OCR | omnidocbench | composite | 86.46 | codesota-api |
| clearocr-teamquest | omnidocbench | reading-order | 86.04 | codesota-api |
| Nanonets-OCR-s | omnidocbench | composite | 85.59 | codesota-api |
| MinerU2-VLM | omnidocbench | composite | 85.56 | codesota-api |
| Dolphin-1.5 | omnidocbench | composite | 85.06 | codesota-api |
| InternVL3.5-241B | omnidocbench | composite | 82.67 | codesota-api |
| olmOCR-7B | omnidocbench | composite | 81.79 | codesota-api |
| POINTS-Reader | omnidocbench | composite | 80.98 | codesota-api |
| InternVL3-76B | omnidocbench | composite | 80.33 | codesota-api |
| mistral-ocr-3 | omnidocbench | composite | 79.75 | codesota-api |
| mistral-ocr-2512 | omnidocbench | composite | 79.75 | codesota-api |
| MinerU2-pipeline | omnidocbench | composite | 75.51 | codesota-api |
| GPT-4o | omnidocbench | composite | 75.02 | codesota-api |
| OCRFlux-3B | omnidocbench | composite | 74.82 | codesota-api |
| Dolphin | omnidocbench | composite | 74.67 | codesota-api |
| Marker 1.8.2 | omnidocbench | composite | 71.3 | codesota-api |
| mistral-ocr-3 | omnidocbench | table-teds | 70.88 | codesota-api |
| clearocr-teamquest | omnidocbench | composite | 31.7 | codesota-api |
| clearocr-teamquest | omnidocbench | formula-edit-distance | 0.902 | codesota-api |
| clearocr-teamquest | omnidocbench | table-teds | 0.8 | codesota-api |
| mistral-ocr-3 | omnidocbench | formula-edit-distance | 0.218 | codesota-api |
| clearocr-teamquest | omnidocbench | text-edit-distance | 0.154 | codesota-api |
| mistral-ocr-3 | omnidocbench | text-edit-distance | 0.099 | codesota-api |
| Qianfan-OCR | omnidocbench | text-edit | 0.041 | codesota-api |
| gpt-4o | omnidocbench | ocr-edit-distance | 0.02 | codesota-api |
| RVT-2 | rlbench | success-rate | 81.4 | codesota-api |
| RVT | rlbench | success-rate | 62.9 | codesota-api |
| PerAct | rlbench | success-rate | 43.4 | codesota-api |
| OVRL-V2 | habitat-objectnav-hm3d | success_rate | 64.7 | codesota-api |
| Habitat-Web | habitat-objectnav-hm3d | success_rate | 35.4 | codesota-api |
| WavLLM | audiobench | avg-score | 50.25 | codesota-api |
| SALMONN | audiobench | avg-score | 43.99 | codesota-api |
| Qwen2-Audio-Instruct | audiobench | avg-score | 42.12 | codesota-api |
| Whisper+LLaMA-3 (cascade) | audiobench | avg-score | 40.9 | codesota-api |
| Qwen-Audio-Chat | audiobench | avg-score | 38.59 | codesota-api |
| BEATs | audioset | map | 0.506 | codesota-api |
| AST | audioset | map | 0.485 | codesota-api |
| HTS-AT | audioset | map | 0.471 | codesota-api |
| CLAP | audioset | map | 0.428 | codesota-api |
| TD3 | mujoco | average-return | 5592 | codesota-api |
| SAC | mujoco | average-return | 5179 | codesota-api |
| PPO | mujoco | average-return | 2038 | codesota-api |
| TD-MPC2 (317M params) | mujoco | average-return | 960 | codesota-api |
| TD-MPC2 (19M params) | mujoco | average-return | 953 | codesota-api |
| FOWM | mujoco | average-return | 945 | codesota-api |
| BRO | mujoco | average-return | 941 | codesota-api |
| TD-MPC2 (5M params) | mujoco | average-return | 929 | codesota-api |
| DreamerV3 | mujoco | average-return | 897 | codesota-api |
| TD-MPC | mujoco | average-return | 857 | codesota-api |
| DrQ-v2 | mujoco | average-return | 799 | codesota-api |
| SAC (state-based) | mujoco | average-return | 777 | codesota-api |
| go-explore | atari-2600 | human-normalized-score | 40000 | codesota-api |
| agent57 | atari-2600 | human-normalized-score | 4731.3 | codesota-api |
| MEME | atari-2600 | human-normalized-score | 4087 | codesota-api |
| bbos-1 | atari-2600 | human-normalized-score | 1100 | codesota-api |
| gdi-h3 | atari-2600 | human-normalized-score | 950 | codesota-api |
| dreamerv3 | atari-2600 | human-normalized-score | 840 | codesota-api |
| muzero | atari-2600 | human-normalized-score | 731 | codesota-api |
| EfficientZero V2 | atari-2600 | human-normalized-score | 242.8 | codesota-api |
| rainbow-dqn | atari-2600 | human-normalized-score | 231 | codesota-api |
| BBF (Bigger, Better, Faster) | atari-2600 | human-normalized-score | 224.7 | codesota-api |
| DIAMOND | atari-2600 | human-normalized-score | 145.9 | codesota-api |
| STORM | atari-2600 | human-normalized-score | 126.7 | codesota-api |
| Simulus | atari-2600 | human-normalized-score | 110 | codesota-api |
| DART | atari-2600 | human-normalized-score | 102.2 | codesota-api |
| human-gamer | atari-2600 | human-normalized-score | 100 | codesota-api |
| dqn | atari-2600 | human-normalized-score | 79 | codesota-api |
| SegFormer-B5 | cityscapes | miou | 84 | codesota-api |
| Mask2Former (Swin-L) | cityscapes | miou | 83.3 | codesota-api |
| OneFormer (DiNAT-L) | cityscapes | miou | 83 | codesota-api |
| Qwen2-VL 72B | vqa-v2 | accuracy | 87.6 | codesota-api |
| InternVL2-76B | vqa-v2 | accuracy | 87.2 | codesota-api |
| Gemini 1.5 Pro | vqa-v2 | accuracy | 86.5 | codesota-api |
| PaLI-X 55B | vqa-v2 | accuracy | 86.1 | codesota-api |
| NVLM-D 1.0 72B | vqa-v2 | accuracy | 85.4 | codesota-api |
| NVLM-X 1.0 72B | vqa-v2 | accuracy | 85.2 | codesota-api |
| NVLM-H 1.0 72B | vqa-v2 | accuracy | 85.2 | codesota-api |
| VILA-1.5 40B | vqa-v2 | accuracy | 84.3 | codesota-api |
| LLaVA-NeXT 34B | vqa-v2 | accuracy | 83.7 | codesota-api |
| LLaVA-NeXT 13B | vqa-v2 | accuracy | 82.8 | codesota-api |
| CogVLM-17B | vqa-v2 | accuracy | 82.3 | codesota-api |
| LLaVA-NeXT 7B (Mistral) | vqa-v2 | accuracy | 82.2 | codesota-api |
| BLIP-2 | vqa-v2 | accuracy | 82.19 | codesota-api |
| LLaVA-NeXT 7B (Vicuna) | vqa-v2 | accuracy | 81.8 | codesota-api |
| Pixtral Large | vqa-v2 | accuracy | 80.9 | codesota-api |
| Llama 3-V 405B | vqa-v2 | accuracy | 80.2 | codesota-api |
| LLaVA-1.5 13B | vqa-v2 | accuracy | 80 | codesota-api |
| LLaVA-1.5 | vqa-v2 | accuracy | 80 | codesota-api |
| Llama 3-V 70B | vqa-v2 | accuracy | 79.1 | codesota-api |
| Pixtral-12B | vqa-v2 | accuracy | 78.6 | codesota-api |
| GPT-4o | vqa-v2 | accuracy | 78.5 | codesota-api |
| Llama 3.2 90B Vision Instruct | vqa-v2 | accuracy | 78.1 | codesota-api |
| GPT-4V | vqa-v2 | accuracy | 77.2 | codesota-api |
| chandra-ocr-0.1.0 | olmocr-bench | base | 99.9 | codesota-api |
| olmocr-v0.4.0 | olmocr-bench | base | 99.7 | codesota-api |
| LightOnOCR-2-1B | olmocr-bench | base | 99.6 | codesota-api |
| Qianfan-OCR | olmocr-bench | base | 99.6 | codesota-api |
| olmocr-v0.4.0 | olmocr-bench | headers-footers | 96.1 | codesota-api |
| olmocr-v0.3.0 | olmocr-bench | headers-footers | 95.1 | codesota-api |
| chandra-ocr-0.1.0 | olmocr-bench | long-tiny-text | 92.3 | codesota-api |
| Qianfan-OCR | olmocr-bench | multi-column | 92.2 | codesota-api |
| LightOnOCR-2-1B | olmocr-bench | long-tiny-text | 91.4 | codesota-api |
| chandra-ocr-0.1.0 | olmocr-bench | headers-footers | 90.8 | codesota-api |
| LightOnOCR-2-1B | olmocr-bench | arxiv | 89.6 | codesota-api |
| LightOnOCR-2-1B | olmocr-bench | tables | 89 | codesota-api |
| dots-ocr-3b | olmocr-bench | tables | 88.3 | codesota-api |
| chandra-ocr-0.1.0 | olmocr-bench | tables | 88 | codesota-api |
| LightOnOCR-2-1B | olmocr-bench | old-scans-math | 85.6 | codesota-api |
| olmocr-v0.4.0 | olmocr-bench | tables | 84.9 | codesota-api |
| LightOnOCR-2-1B | olmocr-bench | multi-column | 84.8 | codesota-api |
| dots.mocr | olmocr-bench | pass-rate | 83.9 | codesota-api |
| marker-1.10.0 | olmocr-bench | arxiv | 83.8 | codesota-api |
| olmocr-v0.4.0 | olmocr-bench | multi-column | 83.7 | codesota-api |
| LightOnOCR-2-1B | olmocr-bench | pass-rate | 83.2 | codesota-api |
| chandra-ocr-0.1.0 | olmocr-bench | pass-rate | 83.1 | codesota-api |
| olmocr-v0.4.0 | olmocr-bench | arxiv | 83 | codesota-api |
| infinity-parser-7b | olmocr-bench | pass-rate | 82.5 | codesota-api |
| olmocr-v0.4.0 | olmocr-bench | pass-rate | 82.4 | codesota-api |
| olmocr-v0.4.0 | olmocr-bench | old-scans-math | 82.3 | codesota-api |
| chandra-ocr-0.1.0 | olmocr-bench | arxiv | 82.2 | codesota-api |
| olmocr-v0.4.0 | olmocr-bench | long-tiny-text | 81.9 | codesota-api |
| Qianfan-OCR | olmocr-bench | tables | 81.6 | codesota-api |
| chandra-ocr-0.1.0 | olmocr-bench | multi-column | 81.2 | codesota-api |
| Qianfan-OCR | olmocr-bench | long-tiny-text | 80.4 | codesota-api |
| chandra-ocr-0.1.0 | olmocr-bench | old-scans-math | 80.3 | codesota-api |
| Qianfan-OCR | olmocr-bench | arxiv | 80.1 | codesota-api |
| paddleocr-vl | olmocr-bench | pass-rate | 80 | codesota-api |
| olmocr-v0.3.0 | olmocr-bench | old-scans-math | 79.9 | codesota-api |
| Qianfan-OCR | olmocr-bench | pass-rate | 79.8 | codesota-api |
| Qwen3-VL-4B | olmocr-bench | pass-rate | 79.2 | codesota-api |
| PaddleOCR-VL-1.5 | olmocr-bench | pass-rate | 79.1 | codesota-api |
| dots-ocr-3b | olmocr-bench | pass-rate | 79.1 | codesota-api |
| mistral-ocr-3 | olmocr-bench | pass-rate | 78 | codesota-api |
| marker-1.10.0 | olmocr-bench | pass-rate | 76.5 | codesota-api |
| marker-1.10.1 | olmocr-bench | pass-rate | 76.1 | codesota-api |
| MonkeyOCR-pro-3B | olmocr-bench | pass-rate | 75.8 | codesota-api |
| deepseek-ocr | olmocr-bench | pass-rate | 75.7 | codesota-api |
| mineru-2.5 | olmocr-bench | pass-rate | 75.2 | codesota-api |
| Qianfan-OCR | olmocr-bench | old-scans | 73.1 | codesota-api |
| mistral-ocr-api | olmocr-bench | pass-rate | 72 | codesota-api |
| gpt-4o-anchored | olmocr-bench | pass-rate | 69.9 | codesota-api |
| nanonets-ocr2-3b | olmocr-bench | pass-rate | 69.5 | codesota-api |
| gemini-flash-2 | olmocr-bench | pass-rate | 63.8 | codesota-api |
| chandra-ocr-0.1.0 | olmocr-bench | old-scans | 50.4 | codesota-api |
| olmocr-v0.4.0 | olmocr-bench | old-scans | 47.7 | codesota-api |
| LightOnOCR-2-1B | olmocr-bench | old-scans | 42.2 | codesota-api |
| Qianfan-OCR | olmocr-bench | headers-footers | 42 | codesota-api |
| gpt-4o | olmocr-bench | old-scans | 40.7 | codesota-api |
| Med-Gemini | medqa-usmle | Accuracy | 91.1 | codesota-api |
| Med-PaLM 2 | medqa-usmle | Accuracy | 86.5 | codesota-api |
| GPT-4 (base) | medqa-usmle | Accuracy | 86.1 | codesota-api |
| LayoutLMv3-large | funsd | f1 | 92.08 | codesota-api |
| UDOP | funsd | f1 | 91.62 | codesota-api |
| LayoutLMv3-base | funsd | f1 | 90.29 | codesota-api |
| DocFormerv2-large | funsd | f1 | 88.89 | codesota-api |
| LiLT[EN-R2]-base | funsd | f1 | 88.41 | codesota-api |
| DocFormerv2-base | funsd | f1 | 88.37 | codesota-api |
| StructuralLM | funsd | f1 | 85.14 | codesota-api |
| FormNet | funsd | f1 | 84.69 | codesota-api |
| BROS-large | funsd | f1 | 84.52 | codesota-api |
| LayoutLMv2-large | funsd | f1 | 84.2 | codesota-api |
| LayoutLMv2-base | funsd | f1 | 82.76 | codesota-api |
| LayoutLMv1-base | funsd | f1 | 79.27 | codesota-api |
| LayoutLMv1-large | funsd | f1 | 77.89 | codesota-api |
| DeepSeek-R1-0528 | livecodebench | pass@1 | 73.3 | codesota-api |
| Qwen3-235B-A22B | livecodebench | pass@1 | 70.7 | codesota-api |
| DeepSeek-R1 | livecodebench | pass@1 | 65.9 | codesota-api |
| DeepSeek-R1-Distill-Llama-70B | livecodebench | pass@1 | 65.2 | codesota-api |
| OpenAI o1 (Dec 2024) | livecodebench | pass@1 | 63.4 | codesota-api |
| Kimi k1.5 (long-CoT) | livecodebench | pass@1 | 62.5 | codesota-api |
| DeepSeek-R1-Distill-Qwen-32B | livecodebench | pass@1 | 62.1 | codesota-api |
| DeepSeek-R1-Distill-Qwen-14B | livecodebench | pass@1 | 59.1 | codesota-api |
| o1-mini | livecodebench | pass@1 | 53.8 | codesota-api |
| DeepSeek-V3-0324 | livecodebench | pass@1 | 49.2 | codesota-api |
| DeepSeek-R1-Distill-Qwen-7B | livecodebench | pass@1 | 49.1 | codesota-api |
| DeepSeek-R1-Distill-Llama-8B | livecodebench | pass@1 | 49 | codesota-api |
| Kimi k1.5 (short-CoT) | livecodebench | pass@1 | 47.3 | codesota-api |
| Llama 4 Maverick (17B-128E) | livecodebench | pass@1 | 43.4 | codesota-api |
| DeepSeek-V3 | livecodebench | pass@1 | 40.5 | codesota-api |
| Gemma 3 27B IT | livecodebench | pass@1 | 39 | codesota-api |
| Claude 3.5 Sonnet | livecodebench | pass@1 | 38.9 | codesota-api |
| GPT-4o | livecodebench | pass@1 | 32.9 | codesota-api |
| Llama 4 Scout (17B-16E) | livecodebench | pass@1 | 32.8 | codesota-api |
| Gemma 3 12B IT | livecodebench | pass@1 | 32 | codesota-api |
| Qwen2.5-Coder-32B-Instruct | livecodebench | pass@1 | 31.4 | codesota-api |
| Gemma 3 4B IT | livecodebench | pass@1 | 23 | codesota-api |
| o4-mini (high) | humaneval | pass@1 | 99.3 | codesota-api |
| o3-mini (high) | humaneval | pass@1 | 97.6 | codesota-api |
| o4-mini | humaneval | pass@1 | 97.3 | codesota-api |
| o3-mini | humaneval | pass@1 | 96.3 | codesota-api |
| gpt-41 | humaneval | pass@1 | 94.5 | codesota-api |
| GPT-4.1 mini | humaneval | pass@1 | 93.8 | codesota-api |
| Qwen2.5-Coder-32B-Instruct | humaneval | pass@1 | 92.7 | codesota-api |
| o1-preview | humaneval | pass@1 | 92.4 | codesota-api |
| o1-mini | humaneval | pass@1 | 92.4 | codesota-api |
| Claude 3.5 Sonnet (Oct 2024) | humaneval | pass@1 | 92.1 | codesota-api |
| claude-35-sonnet | humaneval | pass@1 | 92 | codesota-api |
| gpt-4o | humaneval | pass@1 | 91 | codesota-api |
| GPT-4o (Nov 2024) | humaneval | pass@1 | 90.2 | codesota-api |
| llama-31-405b | humaneval | pass@1 | 89 | codesota-api |
| gpt-45-preview | humaneval | pass@1 | 88.6 | codesota-api |
| grok-2 | humaneval | pass@1 | 88.4 | codesota-api |
| Qwen2.5-Coder-7B-Instruct | humaneval | pass@1 | 88.4 | codesota-api |
| o3 (high) | humaneval | pass@1 | 88.4 | codesota-api |
| gpt-4-turbo | humaneval | pass@1 | 88.2 | codesota-api |
| Gemma 3 27B IT | humaneval | pass@1 | 87.8 | codesota-api |
| o3 | humaneval | pass@1 | 87.4 | codesota-api |
| gpt-4o-mini | humaneval | pass@1 | 87.2 | codesota-api |
| GPT-4.1 nano | humaneval | pass@1 | 87 | codesota-api |
| Gemma 3 12B IT | humaneval | pass@1 | 85.4 | codesota-api |
| DeepSeek-Coder-V2-Instruct | humaneval | pass@1 | 85.4 | codesota-api |
| claude-3-opus | humaneval | pass@1 | 84.9 | codesota-api |
| Phi-4 (14B) | humaneval | pass@1 | 82.6 | codesota-api |
| deepseek-v3 | humaneval | pass@1 | 82.6 | codesota-api |
| llama-3-70b | humaneval | pass@1 | 81.7 | codesota-api |
| llama-31-70b | humaneval | pass@1 | 80.5 | codesota-api |
| gemini-15-pro | humaneval | pass@1 | 71.9 | codesota-api |
| Gemma 3 4B IT | humaneval | pass@1 | 71.3 | codesota-api |
| DeepSeek-V3 | humaneval | pass@1 | 65.2 | codesota-api |
| DINOv2 ViT-g/14 | imagenet-linear-probe | top1_accuracy | 86.5 | codesota-api |
| DINOv2 ViT-g/14 | imagenet-linear-probe | top-1-accuracy | 86.5 | codesota-api |
| DINOv2 ViT-L/14 | imagenet-linear-probe | top-1-accuracy | 86.3 | codesota-api |
| CLIP ViT-L/14 | imagenet-linear-probe | top-1-accuracy | 85.3 | codesota-api |
| SimCLRv2 (ResNet-152 3x) | imagenet-linear-probe | top1_accuracy | 79.8 | codesota-api |
| MAE ViT-H/14 | imagenet-linear-probe | top-1-accuracy | 77.2 | codesota-api |
| MAE ViT-H/14 | imagenet-linear-probe | top1_accuracy | 76.6 | codesota-api |
| MAE ViT-L/16 | imagenet-linear-probe | top-1-accuracy | 76 | codesota-api |
| Nova 2 | wildasr | cer | 10.1 | codesota-api |
| Qwen2-Audio | wildasr | cer | 9.1 | codesota-api |
| Scribe V1 | wildasr | cer | 8.7 | codesota-api |
| Whisper Large V3 | wildasr | cer | 7.5 | codesota-api |
| Gemini 2.5 Pro | wildasr | cer | 6.7 | codesota-api |
| GPT-4o Transcribe | wildasr | cer | 6.4 | codesota-api |
| Gemini 3 Pro | wildasr | cer | 6.1 | codesota-api |
| Nova 2 | wildasr | wer | 6 | codesota-api |
| Qwen2-Audio | wildasr | wer | 5.8 | codesota-api |
| Whisper Large V3 | wildasr | wer | 4.2 | codesota-api |
| Gemini 2.5 Pro | wildasr | wer | 3.6 | codesota-api |
| Scribe V1 | wildasr | wer | 3.6 | codesota-api |
| Gemini 3 Pro | wildasr | wer | 2.8 | codesota-api |
| GPT-4o Transcribe | wildasr | wer | 2.8 | codesota-api |
| AutoAttack vs Undefended ResNet | robustbench-cifar10-linf-attack | Attack Success Rate | 100 | codesota-api |
| AutoAttack vs Wang 2023 | robustbench-cifar10-linf-attack | Attack Success Rate | 29.3 | codesota-api |
| AutoAttack vs Peng 2023 | robustbench-cifar10-linf-attack | Attack Success Rate | 28.8 | codesota-api |
| TAPE + RevGAT | cora | accuracy | 92.9 | codesota-api |
| AuGLM (T5-large) | cora | accuracy | 91.51 | codesota-api |
| ENGINE | cora | accuracy | 91.48 | codesota-api |
| InstructGLM | cora | accuracy | 90.77 | codesota-api |
| GLEM + RevGAT | cora | accuracy | 88.56 | codesota-api |
| GCNLLMEmb | cora | accuracy | 88.15 | codesota-api |
| LLaGA (Mistral-7B) | cora | accuracy | 87.55 | codesota-api |
| SDGAT | cora | accuracy | 85.29 | codesota-api |
| GCN* (tuned) | cora | accuracy | 85.08 | codesota-api |
| GAT* (tuned) | cora | accuracy | 84.64 | codesota-api |
| SGFormer | cora | accuracy | 84.5 | codesota-api |
| GraphSAGE* (tuned) | cora | accuracy | 84.18 | codesota-api |
| Polynormer | cora | accuracy | 83.25 | codesota-api |
| GOAT | cora | accuracy | 83.18 | codesota-api |
| GAT | cora | accuracy | 83 | codesota-api |
| GraphGPS | cora | accuracy | 82.84 | codesota-api |
| Exphormer | cora | accuracy | 82.77 | codesota-api |
| GraphSAGE | cora | accuracy | 82.68 | codesota-api |
| NodeFormer | cora | accuracy | 82.2 | codesota-api |
| NAGphormer | cora | accuracy | 82.12 | codesota-api |
| GCN | cora | accuracy | 81.5 | codesota-api |
| ViTPose-H | coco-keypoints | ap | 80.9 | codesota-api |
| RTMPose-X | coco-keypoints | ap | 78.8 | codesota-api |
| HRNet-W48 | coco-keypoints | ap | 75.5 | codesota-api |
| GROVER-Large | moleculenet-bbbp | ROC-AUC | 0.94 | codesota-api |
| D-MPNN (ChemProp) | moleculenet-bbbp | ROC-AUC | 0.913 | codesota-api |
| MolCLR | moleculenet-bbbp | ROC-AUC | 0.736 | codesota-api |
| ZoeDepth-N | nyu-depth-v2 | absrel | 0.075 | codesota-api |
| Marigold | nyu-depth-v2 | absrel | 0.055 | codesota-api |
| MiDaS 3.1 (BEiT-512) | nyu-depth-v2 | absrel | 0.048 | codesota-api |
| Depth Anything V1 (ViT-L) | nyu-depth-v2 | absrel | 0.045 | codesota-api |
| Depth Anything V2 (ViT-L) | nyu-depth-v2 | absrel | 0.041 | codesota-api |
| Megatron-BERT | race | accuracy | 90.9 | codesota-api |
| ALBERT (Ensemble) | race | accuracy | 89.4 | codesota-api |
| GPT-4 | xnli | accuracy | 87.4 | codesota-api |
| XLM-RoBERTa-large | xnli | accuracy | 83.6 | codesota-api |
| mDeBERTa-v3-base | xnli | accuracy | 80.8 | codesota-api |
| Puigcerver | rimes | wer | 9.9 | codesota-api |
| GatedHTR | rimes | wer | 8.7 | codesota-api |
| Puigcerver | rimes | cer | 3.21 | codesota-api |
| VAN | rimes | cer | 1.91 | codesota-api |
| GatedHTR | rimes | cer | 1.81 | codesota-api |
| Stable Audio Open | audiocaps-t2a | fad | 2.57 | codesota-api |
| AudioGen Medium | audiocaps-t2a | fad | 1.82 | codesota-api |
| AudioLDM 2 | audiocaps-t2a | fad | 1.42 | codesota-api |
| AudioLDM | audiocaps | fad | 4.48 | codesota-api |
| AudioLDM 2-Full-Large | audiocaps | fad | 1.86 | codesota-api |
| AudioLDM 2-Full | audiocaps | fad | 1.78 | codesota-api |
| TANGO | audiocaps | fad | 1.73 | codesota-api |
| AudioLDM 2-AC-Large | audiocaps | fad | 1.42 | codesota-api |
| EVA-CLIP-18B | imagenet-zero-shot | top-1 | 83.8 | codesota-api |
| SigLIP-SO400M | imagenet-zero-shot | top-1 | 83.2 | codesota-api |
| OpenCLIP ViT-G/14 | imagenet-zero-shot | top-1 | 80.1 | codesota-api |
| CLIP ViT-L/14 | imagenet-zero-shot | top-1 | 75.5 | codesota-api |
| Diffusion-QL | d4rl-halfcheetah-medium | normalized_return | 51.1 | codesota-api |
| IQL (Implicit Q-Learning) | d4rl-halfcheetah-medium | normalized_return | 47.4 | codesota-api |
| CQL (Conservative Q-Learning) | d4rl-halfcheetah-medium | normalized_return | 44 | codesota-api |
| π0 (Pi-Zero) | libero-long | success_rate | 85.2 | codesota-api |
| OpenVLA | libero-long | success_rate | 53.7 | codesota-api |
| Octo-Base | libero-long | success_rate | 51.1 | codesota-api |
| Qwen2.5-Coder-32B | mbpp-plus | pass@1 | 76.4 | codesota-api |
| DeepSeek-V3 | mbpp-plus | pass@1 | 73 | codesota-api |
| GPT-4o | mbpp-plus | pass@1 | 71.2 | codesota-api |
| DeepSeek-Coder-33B | mbpp-plus | pass@1 | 66 | codesota-api |
| Marigold | kitti-depth | absrel | 0.099 | codesota-api |
| MiDaS 3.1 (BEiT-512) | kitti-depth | absrel | 0.058 | codesota-api |
| ZoeDepth-K | kitti-depth | absrel | 0.053 | codesota-api |
| Depth Anything V1 (ViT-L) | kitti-depth | absrel | 0.046 | codesota-api |
| Depth Anything V2 (ViT-L) | kitti-depth | absrel | 0.04 | codesota-api |
| SegNet (class-level) | dagm-2007 | Accuracy | 100 | codesota-api |
| ResNet baseline | dagm-2007 | Accuracy | 99.8 | codesota-api |
| VAN | iam | wer | 16.3 | codesota-api |
| HTR-VT | iam | wer | 14.9 | codesota-api |
| HTR-ConvText | iam | wer | 12.9 | codesota-api |
| VAN | iam | cer | 5 | codesota-api |
| HTR-VT | iam | cer | 4.7 | codesota-api |
| HTR-ConvText | iam | cer | 4 | codesota-api |
| TrOCR-base | iam | cer | 3.42 | codesota-api |
| TrOCR-large | iam | cer | 2.89 | codesota-api |
| Qwen2.5-Coder-32B | humaneval-plus | pass@1 | 87.2 | codesota-api |
| DeepSeek-V3 | humaneval-plus | pass@1 | 86.6 | codesota-api |
| GPT-4o | humaneval-plus | pass@1 | 86 | codesota-api |
| DeepSeek-Coder-V2 | humaneval-plus | pass@1 | 82.3 | codesota-api |
| DeepSeek-Coder-33B | humaneval-plus | pass@1 | 75 | codesota-api |
| HIVE-COTE 2.0 | ucr-archive | mean_accuracy | 88.6 | codesota-api |
| Hydra + MultiRocket | ucr-archive | mean_accuracy | 88.3 | codesota-api |
| InceptionTime | ucr-archive | mean_accuracy | 85 | codesota-api |
| InternVideo2 | kinetics-400 | top-1 | 92.1 | codesota-api |
| VideoMAE V2 (ViT-g) | kinetics-400 | top-1 | 90 | codesota-api |
| ViViT-H | kinetics-400 | top-1 | 84.9 | codesota-api |
| TimeSformer-L | kinetics-400 | top-1 | 80.7 | codesota-api |
| GPT-4 | wmt23 | comet | 84.1 | codesota-api |
| Google Translate | wmt23 | comet | 83.8 | codesota-api |
| DeepL | wmt23 | comet | 83.5 | codesota-api |
| NLLB-3.3B | wmt23 | comet | 81.6 | codesota-api |
| DINOv2 ViT-g/14 | imagenet-knn | top1_accuracy | 83.5 | codesota-api |
| DINOv2 ViT-L/14 | imagenet-knn | top1_accuracy | 83.5 | codesota-api |
| DINO ViT-B/16 | imagenet-knn | top1_accuracy | 76.1 | codesota-api |
| PaLI-X-55B | ok-vqa | accuracy | 66.1 | codesota-api |
| PaLI-17B | ok-vqa | accuracy | 64.5 | codesota-api |
| GPT-4V | ok-vqa | accuracy | 64.28 | codesota-api |
| Flamingo-80B | ok-vqa | accuracy | 57.8 | codesota-api |
| BLIP-2 (FlanT5XXL) | ok-vqa | accuracy | 44.7 | codesota-api |
| EVA-02-L | cifar-100 | accuracy | 97.15 | codesota-api |
| CoAtNet-7 | cifar-100 | accuracy | 96.38 | codesota-api |
| ConvNeXt V2-H | cifar-100 | accuracy | 96.17 | codesota-api |
| MAE ViT-H/14 | cifar-100 | accuracy | 96.08 | codesota-api |
| SwinV2-G | cifar-100 | accuracy | 96.01 | codesota-api |
| DeiT III-H/14 | cifar-100 | accuracy | 95.94 | codesota-api |
| InternImage-XL | cifar-100 | accuracy | 95.77 | codesota-api |
| FasterViT-6 | cifar-100 | accuracy | 95.72 | codesota-api |
| vit-h-14 | cifar-100 | accuracy | 94.55 | codesota-api |
| AIMv2-3B | cifar-100 | accuracy | 94.5 | codesota-api |
| AIMv2-1B | cifar-100 | accuracy | 94.1 | codesota-api |
| ViT-L/16 (IN-21K) | cifar-100 | accuracy | 93.25 | codesota-api |
| efficientnet-b7 | cifar-100 | accuracy | 91.7 | codesota-api |
| vit-b-16 | cifar-100 | accuracy | 91.48 | codesota-api |
| resnet-50 | cifar-100 | accuracy | 78.04 | codesota-api |
| coca-finetuned | imagenet-1k | top-1-accuracy | 91 | codesota-api |
| vit-g-14 | imagenet-1k | top-1-accuracy | 90.45 | codesota-api |
| EVA-02-L | imagenet-1k | top-1-accuracy | 90.056 | codesota-api |
| EVA-Giant | imagenet-1k | top-1-accuracy | 89.79 | codesota-api |
| InternImage-H | imagenet-1k | top-1-accuracy | 89.6 | codesota-api |
| SigLIP-SO400M | imagenet-1k | top-1-accuracy | 89.41 | codesota-api |
| convnext-v2-huge | imagenet-1k | top-1-accuracy | 88.9 | codesota-api |
| ViT-H/14 CLIP (LAION-2B) | imagenet-1k | top-1-accuracy | 88.634 | codesota-api |
| ConvNeXt-XXLarge (CLIP LAION) | imagenet-1k | top-1-accuracy | 88.622 | codesota-api |
| vit-h-14 | imagenet-1k | top-1-accuracy | 88.55 | codesota-api |
| swin-large | imagenet-1k | top-1-accuracy | 87.3 | codesota-api |
| efficientnet-v2-l | imagenet-1k | top-1-accuracy | 85.7 | codesota-api |
| deit-b-distilled | imagenet-1k | top-1-accuracy | 85.2 | codesota-api |
| efficientnet-b7 | imagenet-1k | top-1-accuracy | 84.4 | codesota-api |
| deit-b | imagenet-1k | top-1-accuracy | 83.1 | codesota-api |
| convnext-v2-tiny | imagenet-1k | top-1-accuracy | 83 | codesota-api |
| vit-l-16 | imagenet-1k | top-1-accuracy | 82.7 | codesota-api |
| vit-b-16 | imagenet-1k | top-1-accuracy | 81.2 | codesota-api |
| resnet-50-a3 | imagenet-1k | top-1-accuracy | 80.4 | codesota-api |
| resnet-152 | imagenet-1k | top-1-accuracy | 78.6 | codesota-api |
| efficientnet-b0 | imagenet-1k | top-1-accuracy | 77.1 | codesota-api |
| resnet-50 | imagenet-1k | top-1-accuracy | 76.15 | codesota-api |
| MusicGen-Medium | musiccaps | fad | 4.89 | codesota-api |
| AudioLDM 2-MSD | musiccaps | fad | 4.47 | codesota-api |
| MusicLM | musiccaps | fad | 4 | codesota-api |
| AudioLDM-M | musiccaps | fad | 3.2 | codesota-api |
| AudioLDM 2-Full | musiccaps | fad | 3.13 | codesota-api |
| SAM 2 (Hiera-L) | sa-1b | miou | 62.2 | codesota-api |
| SAM (ViT-H) | sa-1b | miou | 58.1 | codesota-api |
| FastSAM | sa-1b | miou | 57.1 | codesota-api |
| EfficientSAM | sa-1b | miou | 55.5 | codesota-api |
| CogVLM-17B | nocaps | cider | 128.3 | codesota-api |
| PaLI-X-55B | nocaps | cider | 126.3 | codesota-api |
| PaLI-17B | nocaps | cider | 124.4 | codesota-api |
| BLIP-2 (FlanT5XL) | nocaps | cider | 123.7 | codesota-api |
| BLIP-2 (OPT 2.7B) | nocaps | cider | 121.6 | codesota-api |
| BEATs | esc-50 | accuracy | 98.1 | codesota-api |
| HTS-AT | esc-50 | accuracy | 97 | codesota-api |
| AST | esc-50 | accuracy | 95.6 | codesota-api |
| CLAP | esc-50 | accuracy | 93.7 | codesota-api |
| GPT-4 | wikitablequestions | accuracy | 75.3 | codesota-api |
| Claude 3.5 Sonnet | wikitablequestions | accuracy | 73 | codesota-api |
| TAPAS-large | wikitablequestions | accuracy | 48.7 | codesota-api |
| ViT-H/14 (JFT-300M) | cifar-10 | accuracy | 99.5 | codesota-api |
| ViT-L/16 (JFT-300M) | cifar-10 | accuracy | 99.42 | codesota-api |
| BiT-L (ResNet152x4) | cifar-10 | accuracy | 99.37 | codesota-api |
| ViT-H/14 (IN-21K) | cifar-10 | accuracy | 99.27 | codesota-api |
| deit-b-distilled | cifar-10 | accuracy | 99.1 | codesota-api |
| ViT-L/16 (IN-21K) | cifar-10 | accuracy | 99 | codesota-api |
| EfficientNet-B8 (NoisyStudent) | cifar-10 | accuracy | 98.7 | codesota-api |
| convnext-v2-base | cifar-10 | accuracy | 98.7 | codesota-api |
| ViT-B/16 (IN-21K) | cifar-10 | accuracy | 98.13 | codesota-api |
| Swin-B | cifar-10 | accuracy | 98 | codesota-api |
| resnet-50 | cifar-10 | accuracy | 96.01 | codesota-api |
| SLCA (ViT-B/16) | split-cifar100 | average_accuracy | 91.53 | codesota-api |
| DualPrompt (ViT-B/16) | split-cifar100 | average_accuracy | 86.51 | codesota-api |
| L2P (ViT-B/16) | split-cifar100 | average_accuracy | 83.86 | codesota-api |
| Claude 3.5 Sonnet (Oct 2024) | mbpp | pass@1 | 91 | codesota-api |
| Qwen2.5-Coder-32B-Instruct | mbpp | pass@1 | 90.2 | codesota-api |
| DeepSeek-Coder-V2-Instruct | mbpp | pass@1 | 89.4 | codesota-api |
| claude-35-sonnet | mbpp | pass@1 | 89.2 | codesota-api |
| gpt-4o | mbpp | pass@1 | 87.8 | codesota-api |
| GPT-4o (Aug 2024) | mbpp | pass@1 | 86.8 | codesota-api |
| Qwen2.5-Coder-7B-Instruct | mbpp | pass@1 | 83.5 | codesota-api |
| Codestral 22B v0.1 | mbpp | pass@1 | 78.2 | codesota-api |
| Llama 4 Maverick (17B-128E) | mbpp | pass@1 | 77.6 | codesota-api |
| DeepSeek-V3 | mbpp | pass@1 | 75.4 | codesota-api |
| Gemma 3 27B IT | mbpp | pass@1 | 74.4 | codesota-api |
| Gemma 3 12B IT | mbpp | pass@1 | 73 | codesota-api |
| Llama 4 Scout (17B-16E) | mbpp | pass@1 | 67.8 | codesota-api |
| Gemma 3 4B IT | mbpp | pass@1 | 63.2 | codesota-api |
| GPT-4 + AlphaCodium | codecontests | pass@1 | 44 | codesota-api |
| AlphaCode 2 | codecontests | pass@1 | 43 | codesota-api |
| GPT-4 | codecontests | pass@1 | 19 | codesota-api |
| P>M>F (ViT-B, DINO pretrained) | mini-imagenet-5way5shot | accuracy | 95.3 | codesota-api |
| FEAT (ResNet-12) | mini-imagenet-5way5shot | accuracy | 82.05 | codesota-api |
| SSF (ViT-B/16) | vtab-1k | mean_accuracy | 73.1 | codesota-api |
| VPT-Deep (ViT-B/16) | vtab-1k | mean_accuracy | 72 | codesota-api |
| DeBERTa-v3-large | glue-fill-mask | avg-score | 91.37 | codesota-api |
| ALBERT-xxlarge-v2 | glue-fill-mask | avg-score | 89.4 | codesota-api |
| RoBERTa-large | glue-fill-mask | avg-score | 88.5 | codesota-api |
| SimpleNet | mvtec-ad | Image AUROC | 99.6 | codesota-api |
| simplenet | mvtec-ad | auroc | 99.6 | codesota-api |
| fastflow | mvtec-ad | auroc | 99.4 | codesota-api |
| patchcore | mvtec-ad | auroc | 99.1 | codesota-api |
| efficientad | mvtec-ad | auroc | 99.1 | codesota-api |
| PatchCore | mvtec-ad | Image AUROC | 99.1 | codesota-api |
| EfficientAD | mvtec-ad | Image AUROC | 99.1 | codesota-api |
| reverse-distillation | mvtec-ad | auroc | 98.5 | codesota-api |
| cflow-ad | mvtec-ad | auroc | 98.3 | codesota-api |
| draem | mvtec-ad | auroc | 98 | codesota-api |
| padim | mvtec-ad | auroc | 97.9 | codesota-api |
| SVM + hand-crafted features | gdxray-welds | Accuracy | 95.2 | codesota-api |
| ResNet50 CNN | gdxray-welds | Accuracy | 90.26 | codesota-api |
| LlamaParse Agentic | parsebench | accuracy | 84.9 | codesota-api |
| LlamaParse Cost Effective | parsebench | accuracy | 71.9 | codesota-api |
| Google Gemini 3 Flash | parsebench | accuracy | 71 | codesota-api |
| Reducto | parsebench | accuracy | 67.8 | codesota-api |
| Qwen 3 VL | parsebench | accuracy | 62 | codesota-api |
| Azure Document Intelligence | parsebench | accuracy | 59.6 | codesota-api |
| Extend | parsebench | accuracy | 55.8 | codesota-api |
| Dots OCR 1.5 | parsebench | accuracy | 55.8 | codesota-api |
| Docling | parsebench | accuracy | 50.6 | codesota-api |
| Google Cloud Document AI | parsebench | accuracy | 50.4 | codesota-api |
| AWS Textract | parsebench | accuracy | 47.9 | codesota-api |
| OpenAI GPT-5 Mini | parsebench | accuracy | 46.8 | codesota-api |
| LandingAI | parsebench | accuracy | 45.2 | codesota-api |
| Anthropic Haiku 4.5 | parsebench | accuracy | 45.2 | codesota-api |
| swin-v2-large | imagenet-v2 | top-1-accuracy | 84 | microsoft-research |
| convnext-v2-huge | imagenet-v2 | top-1-accuracy | 80.5 | meta-research |
| patchcore | visa | auroc | 92.1 | research-paper |
| simplenet | visa | auroc | 95.5 | research-paper |
| efficientad | visa | auroc | 94.8 | research-paper |
| o3 (high) | math | accuracy | 98.1 | src |
| o4-mini (high) | math | accuracy | 98.2 | src |
| o3-mini | math | accuracy | 97.9 | src |
| o3 | math | accuracy | 97.8 | src |
| o4-mini | math | accuracy | 97.5 | src |
| DeepSeek-R1 | math | accuracy | 97.3 | src |
| Gemini 2.5 Pro | math | accuracy | 97.3 | src |
| o1 | math | accuracy | 96.4 | src |
| Claude 3.7 Sonnet | math | accuracy | 96.2 | src |
| Kimi k1.5 | math | accuracy | 96.2 | src |
| DeepSeek-R1-Zero | math | accuracy | 95.9 | src |
| DeepSeek-R1-Distill-Llama-70B | math | accuracy | 94.5 | src |
| DeepSeek-R1-Distill-Qwen-32B | math | accuracy | 94.3 | src |
| DeepSeek-V3-0324 | math | accuracy | 94 | src |
| QwQ-32B | math | accuracy | 90.6 | src |
| deepseek-v3 | math | accuracy | 90.2 | src |
| o1-mini | math | accuracy | 90 | src |
| GPT-4.5 Preview | math | accuracy | 87.1 | src |
| o1-preview | math | accuracy | 85.5 | src |
| GPT-4.1 | math | accuracy | 82.1 | src |
| gpt-4o | math | accuracy | 76.6 | src |
| Grok 2 | math | accuracy | 76.1 | src |
| Llama 3.1 405B | math | accuracy | 73.8 | src |
| GPT-4 Turbo | math | accuracy | 73.4 | src |
| claude-35-sonnet | math | accuracy | 71.1 | src |
| gpt-4o-mini | math | accuracy | 70.2 | src |
| Llama 3.1 70B | math | accuracy | 68 | src |
| gemini-15-pro | math | accuracy | 67.7 | src |
| Claude 3 Opus | math | accuracy | 60.1 | src |
| U-Net Ensemble (Pavlov) | severstal-steel | Dice | 0.903 | src |
| 2nd Place Solution | severstal-steel | Dice | 0.9084 | src |
| bestfitting (1st place ensemble) | severstal-steel | Dice | 0.90883 | src |
| o1-preview | gsm8k | accuracy | 97.8 | src |
| claude-35-sonnet | gsm8k | accuracy | 96.4 | src |
| llama-3-70b | gsm8k | accuracy | 93 | src |
| gpt-4o | gsm8k | accuracy | 92 | src |
| gemini-15-pro | gsm8k | accuracy | 91.7 | src |
| o3 | mmlu | accuracy | 92.9 | src |
| o1 | mmlu | accuracy | 91.8 | src |
| gpt-45-preview | mmlu | accuracy | 90.8 | src |
| o1-preview | mmlu | accuracy | 90.8 | src |
| gpt-41 | mmlu | accuracy | 90.2 | src |
| o4-mini | mmlu | accuracy | 90 | src |
| llama-31-405b | mmlu | accuracy | 88.6 | src |
| deepseek-v3 | mmlu | accuracy | 88.5 | src |
| claude-35-sonnet | mmlu | accuracy | 88.3 | src |
| grok-2 | mmlu | accuracy | 87.5 | src |
| gpt-4o | mmlu | accuracy | 87.2 | src |
| claude-3-opus | mmlu | accuracy | 86.8 | src |
| gpt-4-turbo | mmlu | accuracy | 86.7 | src |
| gemini-15-pro | mmlu | accuracy | 85.9 | src |
| o3-mini | mmlu | accuracy | 85.9 | src |
| o1-mini | mmlu | accuracy | 85.2 | src |
| llama-31-70b | mmlu | accuracy | 82 | src |
| gpt-4o-mini | mmlu | accuracy | 82 | src |
| llama-3-70b | mmlu | accuracy | 82 | src |
| o3 | gpqa | accuracy | 82.8 | src |
| o4-mini | gpqa | accuracy | 77.6 | src |
| o1 | gpqa | accuracy | 75.7 | src |
| o3-mini | gpqa | accuracy | 74.9 | src |
| o1-preview | gpqa | accuracy | 73.3 | src |
| gpt-45-preview | gpqa | accuracy | 69.5 | src |
| gpt-41 | gpqa | accuracy | 66.3 | src |
| o1-mini | gpqa | accuracy | 60 | src |
| claude-35-sonnet | gpqa | accuracy | 59.4 | src |
| grok-2 | gpqa | accuracy | 56 | src |
| llama-31-405b | gpqa | accuracy | 50.7 | src |
| claude-3-opus | gpqa | accuracy | 50.4 | src |
| gpt-4o | gpqa | accuracy | 49.9 | src |
| gpt-4-turbo | gpqa | accuracy | 49.3 | src |
| gemini-15-pro | gpqa | accuracy | 46.2 | src |
| llama-31-70b | gpqa | accuracy | 41.7 | src |
| gpt-4o-mini | gpqa | accuracy | 40.2 | src |
| o1-preview | aime-2024 | accuracy | 83.3 | src |
| claude-35-opus | aime-2024 | accuracy | 16 | src |
| gpt-4o | aime-2024 | accuracy | 13.4 | src |
| Claude Opus 4.7 | swe-bench-verified | resolve-rate | 87.6 | vendor |
| Claude Opus 4.5 | swe-bench-verified | resolve-rate | 80.9 | src |
| Claude Opus 4.6 | swe-bench-verified | resolve-rate | 80.8 | src |
| Gemini 3.1 Pro | swe-bench-verified | resolve-rate | 80.6 | src |
| MiniMax M2.5 | swe-bench-verified | resolve-rate | 80.2 | src |
| GPT-5.2 Thinking | swe-bench-verified | resolve-rate | 80 | src |
| Claude Sonnet 4.6 | swe-bench-verified | resolve-rate | 79.6 | src |
| Gemini 3 Flash | swe-bench-verified | resolve-rate | 78 | src |
| Claude Sonnet 4.5 | swe-bench-verified | resolve-rate | 77.2 | src |
| Kimi K2.5 | swe-bench-verified | resolve-rate | 76.8 | src |
| GPT-5.1 | swe-bench-verified | resolve-rate | 76.3 | src |
| Gemini 3 Pro | swe-bench-verified | resolve-rate | 76.2 | src |
| GPT-5 | swe-bench-verified | resolve-rate | 74.9 | src |
| MiniMax M2.1 | swe-bench-verified | resolve-rate | 74 | src |
| Claude Haiku 4.5 | swe-bench-verified | resolve-rate | 73.3 | src |
| Claude Sonnet 4 | swe-bench-verified | resolve-rate | 72.7 | src |
| Claude Opus 4 | swe-bench-verified | resolve-rate | 72.5 | src |
| Devstral 2 | swe-bench-verified | resolve-rate | 72.2 | src |
| Qwen3-Coder-480B | swe-bench-verified | resolve-rate | 69.6 | src |
| MiniMax M2 | swe-bench-verified | resolve-rate | 69.4 | src |
| o3 | swe-bench-verified | resolve-rate | 69.1 | src |
| o4-mini | swe-bench-verified | resolve-rate | 68.1 | src |
| DeepSeek V3.1 | swe-bench-verified | resolve-rate | 66 | src |
| Kimi K2 | swe-bench-verified | resolve-rate | 65.8 | src |
| Grok 3 | swe-bench-verified | resolve-rate | 63.8 | src |
| Gemini 2.5 Pro | swe-bench-verified | resolve-rate | 63.8 | src |
| Claude 3.7 Sonnet | swe-bench-verified | resolve-rate | 63.7 | src |
| Gemini 2.5 Flash | swe-bench-verified | resolve-rate | 60.4 | src |
| DeepSeek R1-0528 | swe-bench-verified | resolve-rate | 57.6 | src |
| o3-mini | swe-bench-verified | resolve-rate | 55.8 | src |
| GPT-4.1 | swe-bench-verified | resolve-rate | 54.6 | src |
| Claude 3.5 Sonnet | swe-bench-verified | resolve-rate | 50.8 | src |
| DeepSeek-R1 | swe-bench-verified | resolve-rate | 49.2 | src |
| o1 | swe-bench-verified | resolve-rate | 48.9 | src |
| Devstral Small 2505 | swe-bench-verified | resolve-rate | 46.8 | src |
| DeepSeek V3 | swe-bench-verified | resolve-rate | 42 | src |
| GPT-4o | swe-bench-verified | resolve-rate | 41.2 | src |
| Claude 3.5 Haiku | swe-bench-verified | resolve-rate | 40.6 | src |
| DeepSeek V2.5 | swe-bench-verified | resolve-rate | 37 | src |
| co-detr-swin-l | coco | mAP | 66 | src |
| internimage-h | coco | mAP | 65.4 | src |
| Focal-Stable-DINO | coco | mAP | 64.6 | src |
| dino-swin-l | coco | mAP | 63.3 | src |
| EVA-02-L | coco | mAP | 62.3 | src |
| RF-DETR-2XL | coco | mAP | 60.1 | src |
| D-FINE-X (Objects365) | coco | mAP | 59.3 | src |
| yolov10-x | coco | mAP | 57.4 | src |
| RT-DETRv4-X | coco | mAP | 57 | src |
| DINO-X Pro | coco | mAP | 56 | src |
| D-FINE-X | coco | mAP | 55.8 | src |
| YOLOv9-E | coco | mAP | 55.6 | src |
| efficientdet-d7-x | coco | mAP | 55.1 | src |
| YOLO11x | coco | mAP | 54.7 | src |
| RT-DETRv3-R101 | coco | mAP | 54.6 | src |
| RT-DETRv2-X | coco | mAP | 54.3 | src |
| Grounding DINO 1.5 Pro | coco | mAP | 54.3 | src |
| gemini-15-pro | cc-ocr | multi-scene-f1 | 83.25 | src |
| gemini-15-pro | cc-ocr | multilingual-f1 | 78.97 | src |
| qwen2-vl-72b | cc-ocr | multi-scene-f1 | 77.95 | src |
| internvl2-76b | cc-ocr | multi-scene-f1 | 76.92 | src |
| gpt-4o | cc-ocr | multi-scene-f1 | 76.4 | src |
| gpt-4o | cc-ocr | multilingual-f1 | 73.44 | src |
| claude-35-sonnet | cc-ocr | multi-scene-f1 | 72.87 | src |
| qwen2-vl-72b | cc-ocr | kie-f1 | 71.76 | src |
| gemini-15-pro | cc-ocr | kie-f1 | 67.28 | src |
| claude-35-sonnet | cc-ocr | kie-f1 | 64.58 | src |
| gpt-4o | cc-ocr | kie-f1 | 63.45 | src |
| gemini-15-pro | cc-ocr | document-parsing | 62.37 | src |
| paddleocr | kitab-bench | cer | 0.79 | src |
| easyocr | kitab-bench | cer | 0.58 | src |
| tesseract | kitab-bench | cer | 0.54 | src |
| azure-ocr | kitab-bench | cer | 0.52 | src |
| gpt-4o-mini | kitab-bench | cer | 0.43 | src |
| gpt-4o | kitab-bench | cer | 0.31 | src |
| ain-7b | kitab-bench | cer | 0.2 | src |
| gemini-20-flash | kitab-bench | cer | 0.13 | src |
| claude-sonnet-4 | thaiocrbench | ted-score | 0.84 | src |
| gemini-25-pro | thaiocrbench | ted-score | 0.77 | src |
| qwen25-vl-32b | thaiocrbench | ted-score | 0.765 | src |
| internvl3-14b | thaiocrbench | ted-score | 0.76 | src |
| qwen25-vl-72b | thaiocrbench | ted-score | 0.72 | src |
| gemini-25-pro | mme-videoocr | total-accuracy | 73.7 | src |
| qwen25-vl-72b | mme-videoocr | total-accuracy | 69 | src |
| internvl3-78b | mme-videoocr | total-accuracy | 67.2 | src |
| gpt-4o | mme-videoocr | total-accuracy | 66.4 | src |
| gemini-15-pro | mme-videoocr | total-accuracy | 64.9 | src |
| qwen25-vl-32b | mme-videoocr | total-accuracy | 61 | src |
| chexpert-auc-maximizer | chexpert | auroc | 93 | src |
| biovil | chexpert | auroc | 89.1 | src |
| chexzero | chexpert | auroc | 88.6 | src |
| gloria | chexpert | auroc | 88.2 | src |
| medclip | chexpert | auroc | 87.8 | src |
| torchxrayvision | chexpert | auroc | 87.4 | src |
| densenet-121-cxr | chexpert | auroc | 86.5 | src |
| densenet-121-cxr | rsna-pneumonia | auroc | 88.5 | src |
| chexnet | rsna-pneumonia | auroc | 87.2 | src |
| torchxrayvision | nih-chestxray14 | auroc | 85.8 | src |
| chexnet | nih-chestxray14 | auroc | 84.1 | src |
| densenet-121-cxr | nih-chestxray14 | auroc | 82.6 | src |
| resnet-50-cxr | nih-chestxray14 | auroc | 80.4 | src |
| gpt-4o | svamp | accuracy | 93.7 | src |
| claude-35-sonnet | svamp | accuracy | 91.2 | src |
| llama-3-70b | svamp | accuracy | 89.5 | src |
| claude-35-sonnet | arc-challenge | accuracy | 96.7 | src |
| gpt-4o | arc-challenge | accuracy | 96.4 | src |
| gemini-15-pro | arc-challenge | accuracy | 94.8 | src |
| llama-3-70b | arc-challenge | accuracy | 93 | src |
| gpt-4o | commonsenseqa | accuracy | 85.4 | src |
| claude-35-sonnet | commonsenseqa | accuracy | 83.2 | src |
| llama-3-70b | commonsenseqa | accuracy | 80.9 | src |
| gpt-4o | winogrande | accuracy | 87.5 | src |
| claude-35-sonnet | winogrande | accuracy | 85.4 | src |
| llama-3-70b | winogrande | accuracy | 85.3 | src |
| gpt-4o | hellaswag | accuracy | 95.3 | src |
| gemini-15-pro | hellaswag | accuracy | 92.5 | src |
| claude-35-sonnet | hellaswag | accuracy | 89 | src |
| llama-3-70b | hellaswag | accuracy | 88 | src |
| gpt-4o | hotpotqa | f1 | 71.3 | src |
| claude-35-sonnet | hotpotqa | f1 | 68.5 | src |
| gpt-4o | logiqa | accuracy | 56.3 | src |
| claude-35-sonnet | logiqa | accuracy | 53.8 | src |
| gpt-4o | reclor | accuracy | 72.4 | src |
| claude-35-sonnet | reclor | accuracy | 68.9 | src |
| gpt-4o | strategyqa | accuracy | 82.1 | src |
| claude-35-sonnet | strategyqa | accuracy | 79.8 | src |
| gpt-4o | mawps | accuracy | 97.2 | src |
| claude-35-sonnet | mawps | accuracy | 95.8 | src |
| llama-3-70b | mawps | accuracy | 94.1 | src |
| plymouth-dl-model | abide-i | accuracy | 98 | src |
| mcbert | abide-i | accuracy | 93.4 | src |
| ae-fcn | abide-i | accuracy | 85 | src |
| asd-swnet | abide-i | auc | 81 | src |
| braingт | abide-i | auc | 78.7 | src |
| multi-atlas-dnn | abide-i | accuracy | 78.07 | src |
| gcn | abide-i | auc | 78 | src |
| svm-connectivity | abide-i | auc | 77 | src |
| asd-swnet | abide-i | accuracy | 76.52 | src |
| maacnn | abide-i | accuracy | 75.12 | src |
| al-negat | abide-i | accuracy | 74.7 | src |
| braingnn | abide-i | accuracy | 73.3 | src |
| gcn | abide-i | accuracy | 72.2 | src |
| multi-task-transformer | abide-i | accuracy | 72 | src |
| phgcl-ddgformer | abide-i | accuracy | 70.9 | src |
| svm-connectivity | abide-i | accuracy | 70.1 | src |
| deep-learning-heinsfeld | abide-i | accuracy | 70 | src |
| mvs-gcn | abide-i | accuracy | 69.38 | src |
| mvs-gcn | abide-i | auc | 69.01 | src |
| abraham-connectomes | abide-i | accuracy | 67 | src |
| random-forest | abide-i | accuracy | 63 | src |
| deepasd | abide-ii | auc | 93 | src |
| maacnn | abide-ii | accuracy | 72.88 | src |
| mistral-ocr-3 | internal-mistral | overall-accuracy | 94.9 | src |
| mistral-ocr-3 | ocr-cer-benchmark | cer | 3.7 | src |
| mistral-ocr-3 | ocr-wer-benchmark | wer | 7.1 | src |
| chexzero | mimic-cxr | auroc | 89.2 | src |
| torchxrayvision | mimic-cxr | auroc | 86.3 | src |
| convirt | mimic-cxr | auroc | 85.7 | src |
| rad-dino | vindr-cxr | auroc | 91.2 | src |
| torchxrayvision | vindr-cxr | auroc | 87.9 | src |
| torchxrayvision | padchest | auroc | 84.6 | src |
| densenet-121-cxr | covid-chestxray | auroc | 94.7 | src |
| torchxrayvision | covid-chestxray | auroc | 93.2 | src |
| yolov8-weld | weld-defect-xray | map | 87.3 | src |
| defectdet-resnet | neu-det | map | 78.4 | src |
| mistral-ocr-2512 | codesota-verification | pages-per-second | 1.22 | src |
| ONE-PEACE | ade20k | mIoU | 63 | src |
| internimage-h | ade20k | mIoU | 62.9 | src |
| ViT-Adapter-L (BEiT-3) | ade20k | mIoU | 62.8 | src |
| ViT-CoMer-L | ade20k | mIoU | 62.1 | src |
| DINOv2 ViT-g/14 + Mask2Former | ade20k | mIoU | 60.2 | src |
| EVA-02-L + UperNet | ade20k | mIoU | 60.1 | src |
| EoMT-L (DINOv2) | ade20k | mIoU | 59.5 | src |
| OneFormer (DiNAT-L) | ade20k | mIoU | 58.3 | src |
| mask2former-swin-l | ade20k | mIoU | 57.3 | src |
| Swin-L + UperNet | ade20k | mIoU | 53.5 | src |
| SegMAN-L | ade20k | mIoU | 53.2 | src |
| SegFormer-B5 | ade20k | mIoU | 51.8 | src |
| SeMask-L | ade20k | mIoU | 49.35 | src |
| Codex / GPT-5.5 | terminal-bench-2 | accuracy | 82 | terminal-bench-official |
| ForgeCode / GPT-5.4 | terminal-bench-2 | accuracy | 81.8 | terminal-bench-official |
| TongAgents / Gemini 3.1 Pro | terminal-bench-2 | accuracy | 80.2 | terminal-bench-official |
| ForgeCode / Claude Opus 4.6 | terminal-bench-2 | accuracy | 79.8 | terminal-bench-official |
| SageAgent / GPT-5.3-Codex | terminal-bench-2 | accuracy | 78.4 | terminal-bench-official |
| ForgeCode / Gemini 3.1 Pro | terminal-bench-2 | accuracy | 78.4 | terminal-bench-official |
| Droid / GPT-5.3-Codex | terminal-bench-2 | accuracy | 77.3 | terminal-bench-official |
| Capy / Claude Opus 4.6 | terminal-bench-2 | accuracy | 75.3 | terminal-bench-official |
| Simple Codex / GPT-5.3-Codex | terminal-bench-2 | accuracy | 75.1 | terminal-bench-official |
| Terminus-KIRA / Gemini 3.1 Pro | terminal-bench-2 | accuracy | 74.8 | terminal-bench-official |
| Terminus-KIRA / Claude Opus 4.6 | terminal-bench-2 | accuracy | 74.7 | terminal-bench-official |
| Mux / GPT-5.3-Codex | terminal-bench-2 | accuracy | 74.6 | terminal-bench-official |
| MAYA-V2 / Claude 4.6 Opus | terminal-bench-2 | accuracy | 72.1 | terminal-bench-official |
| TongAgents / Claude Opus 4.6 | terminal-bench-2 | accuracy | 71.9 | terminal-bench-official |
| Junie CLI / Multiple | terminal-bench-2 | accuracy | 71 | terminal-bench-official |
| CodeBrain-1 / GPT-5.3-Codex | terminal-bench-2 | accuracy | 70.3 | terminal-bench-official |
| Droid / Claude Opus 4.6 | terminal-bench-2 | accuracy | 69.9 | terminal-bench-official |
| Ante / Gemini 3 Pro | terminal-bench-2 | accuracy | 69.4 | terminal-bench-official |
| IndusAGI Coding Agent / GPT-5.3-Codex | terminal-bench-2 | accuracy | 69.1 | terminal-bench-official |
| Crux / Claude Opus 4.6 | terminal-bench-2 | accuracy | 66.9 | terminal-bench-official |
Fig 1 · All 1008 scored runs in the OCR register. Each row preserves the submission’s reported metric, numeric value and cited source verbatim.
§ 02 · By dataset
Results grouped by benchmark.
§ 03 · Pending verification
Claims awaiting reproduction.
These scores appear in published papers or vendor blog posts but have not yet been re-run against the canonical test split. They are kept visible as signal, but are not treated as evidence.
| Model | Dataset | Claimed value | Status |
|---|---|---|---|
| trocr-large | sroie | 96.58 | needs-pdf-verification |
| trocr-large | iam | 2.89 | needs-pdf-verification |
| paddleocr-v4 | icdar-2015 | — | needs-documentation-verification |
| polish-roberta-ocr | poleval-2021-ocr | — | |
| polish-t5-ocr | poleval-2021-ocr | — | |
| herbert | poleval-2021-ocr | — | |
| abbyy-finereader | impact-psnc | — | |
| tesseract-polish | impact-psnc | — | |
| abbyy-finereader | impact-psnc | — | |
| tesseract-polish | impact-psnc | — | |
| tesseract-polish | codesota-polish | — | |
| tesseract-polish | codesota-polish | — | |
| tesseract-polish | codesota-polish | — | |
| tesseract-polish | codesota-polish-wikipedia | — | |
| tesseract-polish | codesota-polish-real | — | |
| tesseract-polish | codesota-polish-synth-random | — | |
| tesseract-polish | codesota-polish-synth-words | — | |
| claude-sonnet-4 | swe-bench-verified | — | |
| claude-sonnet-4-high-compute | swe-bench-verified | — | |
| claude-opus-4.5 | swe-bench-verified | — | |
| o3 | swe-bench-verified | — | |
| claude-3.7-sonnet | swe-bench-verified | — | |
| claude-3.5-sonnet | swe-bench-verified | — | |
| o1 | swe-bench-verified | — | |
| gpt-4o | swe-bench-verified | — | |
| o3 | aime-2024 | — | |
| o1 | aime-2024 | — | |
| deepseek-r1 | aime-2024 | — | |
| o1 | aime-2024 | — | |
| gpt-4o | aime-2024 | — | |
| o3 | gpqa-diamond | — | |
| gemini-2.5-pro | gpqa-diamond | — | |
| o1 | gpqa-diamond | — | |
| o3-mini | gpqa-diamond | — | |
| claude-3.5-sonnet | gpqa-diamond | — | |
| gpt-4o | gpqa-diamond | — |
§ Final · Methodology
How these numbers stay honest.
Self-reported scores are recorded and labelled claim-only until they are reproduced. Closed API models are run against the public split through their official endpoint, with the model identifier and access date recorded. See the full methodology for what counts as a verified run.
Related OCR reading