Codesota · OCR · ResultsEvery scored run, linked to sourceUpdated 2026-04-20
§ 00 · Register

Every OCR score, traceable.

1008 benchmark results across 112 datasets and 598 distinct models. Every row links to its original source.

Raw data is public at /data/benchmarks.json. Nothing is interpolated; pending claims are listed separately below.

§ 01 · Full register

Complete results, in order of submission.

ModelDatasetMetricValueSource
Peng et al. 2023 (WRN-70-16)robustbench-cifar10-linfRobust Accuracy71.07codesota-api
Wang et al. 2023 (WRN-70-16)robustbench-cifar10-linfRobust Accuracy70.69codesota-api
Gowal et al. 2021 (WRN-70-16)robustbench-cifar10-linfRobust Accuracy66.11codesota-api
Grounding DINO 1.5 Prolvis-zero-shotap47.6codesota-api
OWLv2 (ViT-L)lvis-zero-shotap44.6codesota-api
YOLO-World v2-Xlvis-zero-shotap35.4codesota-api
CodeLlama-34Bappspass@532.81codesota-api
CodeLlama-13Bappspass@523.74codesota-api
CodeLlama-7Bappspass@510.76codesota-api
Ultravox-GLM-4P7voicebenchoverall-score88.86codesota-api
Whisper-v3-large + GPT-4o (cascade)voicebenchoverall-score87.8codesota-api
GPT-4o-Audiovoicebenchoverall-score86.75codesota-api
Whisper-v3-large + LLaMA-3.1-8B (cascade)voicebenchoverall-score77.48codesota-api
Kimi-Audiovoicebenchoverall-score76.91codesota-api
MiniCPM-ovoicebenchoverall-score71.23codesota-api
VITA-1.5voicebenchoverall-score64.53codesota-api
Qwen2-Audiovoicebenchoverall-score55.8codesota-api
LLaMA-Omnivoicebenchoverall-score41.12codesota-api
VITA-1.0voicebenchoverall-score36.43codesota-api
Mini-Omni2voicebenchoverall-score33.49codesota-api
Mini-Omnivoicebenchoverall-score30.42codesota-api
Moshivoicebenchoverall-score29.51codesota-api
Qwen2.5-VL 72Btextvqaaccuracy85.5codesota-api
Qwen2-VL 72Btextvqaaccuracy84.9codesota-api
InternVL2-76Btextvqaaccuracy84.4codesota-api
Llama 3.2 Vision 90Btextvqaaccuracy83.4codesota-api
Gemini 1.5 Protextvqaaccuracy82.2codesota-api
GPT-4Vtextvqaaccuracy78codesota-api
GPT-4otextvqaaccuracy77.4codesota-api
LLaVA-1.5textvqaaccuracy61.3codesota-api
BLIP-2textvqaaccuracy42.5codesota-api
Qwen2.5-VL-72Bocrbench-v2overall-zh-private63.7codesota-api
seed-1.6-visionocrbench-v2overall-en-private62.2codesota-api
gemini-25-proocrbench-v2overall-zh-private62.2codesota-api
Qwen2.5-VL-72Bocrbench-v2overall-en-private61.5codesota-api
qwen3-omni-30bocrbench-v2overall-en-private61.3codesota-api
nemotron-nano-v2-vlocrbench-v2overall-en-private61.2codesota-api
Qianfan-OCRocrbench-v2overall-zh-private60.77codesota-api
gemini-25-proocrbench-v2overall-en-private59.3codesota-api
minicpm-v-4.5-8bocrbench-v2overall-zh-private58.8codesota-api
sail-vl2-8bocrbench-v2overall-zh-private57.6codesota-api
llama-3.1-nemotron-nano-vl-8bocrbench-v2overall-en-private56.4codesota-api
Qianfan-OCRocrbench-v2overall-en-private56codesota-api
InternVL3-14Bocrbench-v2overall-zh-public55.7codesota-api
Qwen2.5-VL-7Bocrbench-v2overall-zh-public55.6codesota-api
gpt-4oocrbench-v2overall-en-private55.5codesota-api
ovis2.5-8bocrbench-v2overall-en-private54.1codesota-api
InternVL3-14Bocrbench-v2overall-en-public52.6codesota-api
Gemini 1.5 Proocrbench-v2overall-en-public51.9codesota-api
gemini-1.5-proocrbench-v2overall-en-private51.6codesota-api
sail-vl2-8bocrbench-v2overall-en-private49.3codesota-api
Ovis2-8Bocrbench-v2overall-zh-public49.2codesota-api
claude-3.5-sonnetocrbench-v2overall-zh-private48.4codesota-api
minicpm-v-4.5-8bocrbench-v2overall-en-private48.4codesota-api
Qwen2-VL-72Bocrbench-v2overall-en-private47.8codesota-api
Ovis2-8Bocrbench-v2overall-en-public47.7codesota-api
gpt-4o-2024ocrbench-v2overall-en-private47.6codesota-api
claude-3.5-sonnetocrbench-v2overall-en-private47.5codesota-api
internvl3.5-14bocrbench-v2overall-en-private47.1codesota-api
step-1vocrbench-v2overall-en-private46.8codesota-api
Qwen2.5-VL-7Bocrbench-v2overall-en-public46.7codesota-api
Step-1Vocrbench-v2overall-en-public46.7codesota-api
GPT-4oocrbench-v2overall-en-public46.5codesota-api
InternVL2.5-78Bocrbench-v2overall-zh-private46.2codesota-api
Qwen2-VL-72Bocrbench-v2overall-zh-private46.1codesota-api
gpt-4o-2024ocrbench-v2overall-zh-private45.7codesota-api
Claude 3.5 Sonnetocrbench-v2overall-en-public45.2codesota-api
MiniCPM-o-2.6ocrbench-v2overall-en-public45.1codesota-api
InternVL2.5-78Bocrbench-v2overall-en-private45codesota-api
grok4ocrbench-v2overall-en-private45codesota-api
gpt-4o-miniocrbench-v2overall-en-private44.1codesota-api
DeepSeek-VL2-Smallocrbench-v2overall-en-public43.3codesota-api
Gemini 1.5 Proocrbench-v2overall-zh-public43.1codesota-api
DeepSeek-VL2-Smallocrbench-v2overall-zh-public42.7codesota-api
GLM-4V-9Bocrbench-v2overall-en-public42.6codesota-api
Step-1Vocrbench-v2overall-zh-public42.6codesota-api
claude-sonnet-4ocrbench-v2overall-en-private42.4codesota-api
qwen2.5-vl-7bocrbench-v2overall-en-private41.8codesota-api
MiniCPM-o-2.6ocrbench-v2overall-zh-public41.1codesota-api
deepseek-vl2-smallocrbench-v2overall-en-private41codesota-api
Pixtral-12Bocrbench-v2overall-en-public40.3codesota-api
Claude 3.5 Sonnetocrbench-v2overall-zh-public39.6codesota-api
pixtral-12bocrbench-v2overall-en-private38.4codesota-api
phi-4-multimodalocrbench-v2overall-en-private38.1codesota-api
glm-4v-9bocrbench-v2overall-en-private37.1codesota-api
GLM-4V-9Bocrbench-v2overall-zh-public36.6codesota-api
LLaVA-OneVision-7Bocrbench-v2overall-en-public36.4codesota-api
Cambrian-1-8Bocrbench-v2overall-en-public34.7codesota-api
Molmo-7Bocrbench-v2overall-en-public34.5codesota-api
molmo-7bocrbench-v2overall-en-private33.9codesota-api
llava-ov-7bocrbench-v2overall-en-private33.7codesota-api
GPT-4oocrbench-v2overall-zh-public32.2codesota-api
LLaVA-NeXT-8Bocrbench-v2overall-en-public31.5codesota-api
idefics3-8bocrbench-v2overall-en-private26codesota-api
mistral-ocr-2512ocrbench-v2overall-en-private25.2codesota-api
TextMonkeyocrbench-v2overall-en-public23.9codesota-api
docowl2ocrbench-v2overall-en-private23.4codesota-api
Monkeyocrbench-v2overall-en-public23.1codesota-api
LLaVA-OneVision-7Bocrbench-v2overall-zh-public17.8codesota-api
TextMonkeyocrbench-v2overall-zh-public15.8codesota-api
Pixtral-12Bocrbench-v2overall-zh-public14.6codesota-api
Monkeyocrbench-v2overall-zh-public13.1codesota-api
Molmo-7Bocrbench-v2overall-zh-public12.8codesota-api
Cambrian-1-8Bocrbench-v2overall-zh-public9.9codesota-api
LLaVA-NeXT-8Bocrbench-v2overall-zh-public9.1codesota-api
NV-Embed-v2beirndcg@1062.65codesota-api
GTE-Qwen2-7B-instructbeirndcg@1060.25codesota-api
E5-Mistral-7B-instructbeirndcg@1056.9codesota-api
ColBERTv2beirndcg@1049.4codesota-api
RankLLaMA-7Bms-marcomrr@1041.8codesota-api
jina-reranker-v2-base-multilingualms-marcomrr@1041.2codesota-api
ColBERTv2ms-marcomrr@1039.7codesota-api
MonoT5-3Bms-marcomrr@1039codesota-api
NV-Embed-v2mtebavg-score72.31codesota-api
GTE-Qwen2-7B-instructmtebavg-score72.05codesota-api
voyage-3-largemtebavg-score70.32codesota-api
E5-Mistral-7B-instructmtebavg-score66.63codesota-api
jina-embeddings-v3mtebavg-score65.18codesota-api
text-embedding-3-largemtebavg-score64.6codesota-api
GTE-Qwen2-7B-instructsts-benchmarkspearman88.4codesota-api
E5-Mistral-7B-instructsts-benchmarkspearman84.7codesota-api
all-MiniLM-L6-v2sts-benchmarkspearman82.8codesota-api
bestfitting (1st place ensemble)severstal-steel-defectDice0.90883codesota-api
2nd Place Solutionseverstal-steel-defectDice0.9084codesota-api
U-Net Ensemble (Pavlov)severstal-steel-defectDice0.903codesota-api
Kling 1.0vbenchtotal-score85.37codesota-api
Runway Gen-3 Alphavbenchtotal-score85.22codesota-api
CogVideoX-5Bvbenchtotal-score82.75codesota-api
Open-Sora 1.2vbenchtotal-score80.91codesota-api
BiGTexogbaccuracy-ogbn-products90.29codesota-api
BiGTexogbaccuracy-ogbn-arxiv88.51codesota-api
GLEM+GIANT+SAGN+SCRogbaccuracy-ogbn-products87.37codesota-api
LD+GIANT+SAGN+SCRogbaccuracy-ogbn-products87.18codesota-api
GraDBERT & RevGAT+KDogbaccuracy-ogbn-products86.92codesota-api
GraphSAGEogbaccuracy-ogbn-products83.89codesota-api
GCNogbaccuracy-ogbn-products82.33codesota-api
GATogbaccuracy-ogbn-products80.99codesota-api
SimTeG+TAPE+RevGATogbaccuracy-ogbn-arxiv78.03codesota-api
TAPE+RevGATogbaccuracy-ogbn-arxiv77.5codesota-api
SimTeG+TAPE+GraphSAGEogbaccuracy-ogbn-arxiv77.48codesota-api
LD+REVGATogbaccuracy-ogbn-arxiv77.26codesota-api
GraDBERT & RevGAT+KDogbaccuracy-ogbn-arxiv77.21codesota-api
GLEM+RevGATogbaccuracy-ogbn-arxiv76.94codesota-api
GCNogbaccuracy-ogbn-arxiv73.6codesota-api
GATogbaccuracy-ogbn-arxiv73.3codesota-api
GraphSAGEogbaccuracy-ogbn-arxiv72.95codesota-api
Cheetah (Vicuna-13B)demon-benchmulti-image-reasoning53.65codesota-api
Cheetah (Vicuna-13B)demon-benchgrounded-qa52.93codesota-api
Cheetah (LLaMA2-7B)demon-benchgrounded-qa51codesota-api
Cheetah (Vicuna-7B)demon-benchmulti-image-reasoning50.28codesota-api
Cheetah (Vicuna-13B)demon-benchknowledge-images-qa49.33codesota-api
Cheetah (LLaMA2-7B)demon-benchmulti-image-reasoning48.68codesota-api
Cheetah (Vicuna-7B)demon-benchgrounded-qa48.6codesota-api
InstructBLIPdemon-benchmulti-image-reasoning48.55codesota-api
InstructBLIPdemon-benchgrounded-qa47.4codesota-api
Cheetah (Vicuna-7B)demon-benchknowledge-images-qa44.93codesota-api
Cheetah (LLaMA2-7B)demon-benchknowledge-images-qa44.93codesota-api
LLaMA-Adapter V2demon-benchgrounded-qa44.8codesota-api
InstructBLIPdemon-benchknowledge-images-qa44.4codesota-api
LLaMA-Adapter V2demon-benchmulti-image-reasoning44.03codesota-api
Otterdemon-benchmulti-image-reasoning43.85codesota-api
MiniGPT-4demon-benchmulti-image-reasoning43.5codesota-api
Cheetah (LLaMA2-7B)demon-benchmultimodal-dialogue42.7codesota-api
mPLUG-Owldemon-benchmulti-image-reasoning42.5codesota-api
Otterdemon-benchgrounded-qa41.67codesota-api
OpenFlamingodemon-benchmulti-image-reasoning41.63codesota-api
LLaVAdemon-benchmulti-image-reasoning41.53codesota-api
BLIP-2demon-benchmulti-image-reasoning39.65codesota-api
Cheetah (Vicuna-13B)demon-benchaccuracy39.28codesota-api
BLIP-2demon-benchgrounded-qa39.23codesota-api
Cheetah (Vicuna-13B)demon-benchmultimodal-dialogue38.14codesota-api
Cheetah (Vicuna-7B)demon-benchmultimodal-dialogue37.5codesota-api
Cheetah (LLaMA2-7B)demon-benchaccuracy37.22codesota-api
Cheetah (Vicuna-7B)demon-benchaccuracy36.37codesota-api
LLaVAdemon-benchgrounded-qa36.2codesota-api
InstructBLIPdemon-benchmultimodal-dialogue33.58codesota-api
BLIP-2demon-benchknowledge-images-qa33.53codesota-api
mPLUG-Owldemon-benchgrounded-qa33.27codesota-api
InstructBLIPdemon-benchaccuracy33codesota-api
mPLUG-Owldemon-benchknowledge-images-qa32.47codesota-api
OpenFlamingodemon-benchgrounded-qa32codesota-api
LLaMA-Adapter V2demon-benchknowledge-images-qa32codesota-api
OpenFlamingodemon-benchknowledge-images-qa30.6codesota-api
MiniGPT-4demon-benchgrounded-qa30.27codesota-api
LLaVAdemon-benchknowledge-images-qa28.33codesota-api
Otterdemon-benchknowledge-images-qa27.73codesota-api
Cheetah (Vicuna-13B)demon-benchvisual-inference27.15codesota-api
Cheetah (Vicuna-13B)demon-benchrelation-cloze27.15codesota-api
BLIP-2demon-benchaccuracy26.92codesota-api
Cheetah (Vicuna-13B)demon-benchstorytelling26.59codesota-api
MiniGPT-4demon-benchknowledge-images-qa26.4codesota-api
LLaMA-Adapter V2demon-benchaccuracy26.3codesota-api
BLIP-2demon-benchmultimodal-dialogue26.12codesota-api
Cheetah (Vicuna-7B)demon-benchvisual-inference25.9codesota-api
OpenFlamingodemon-benchaccuracy25.83codesota-api
Cheetah (LLaMA2-7B)demon-benchvisual-inference25.5codesota-api
Cheetah (Vicuna-7B)demon-benchstorytelling25.2codesota-api
Cheetah (LLaMA2-7B)demon-benchstorytelling24.76codesota-api
Otterdemon-benchaccuracy24.51codesota-api
InstructBLIPdemon-benchstorytelling24.41codesota-api
OpenFlamingodemon-benchstorytelling24.22codesota-api
mPLUG-Owldemon-benchaccuracy23.13codesota-api
Cheetah (LLaMA2-7B)demon-benchrelation-cloze22.95codesota-api
MiniGPT-4demon-benchaccuracy22.21codesota-api
Cheetah (Vicuna-7B)demon-benchrelation-cloze22.15codesota-api
OpenFlamingodemon-benchrelation-cloze21.65codesota-api
BLIP-2demon-benchstorytelling21.31codesota-api
LLaVAdemon-benchaccuracy21.24codesota-api
InstructBLIPdemon-benchrelation-cloze21.2codesota-api
mPLUG-Owldemon-benchstorytelling19.33codesota-api
LLaMA-Adapter V2demon-benchrelation-cloze18codesota-api
BLIP-2demon-benchrelation-cloze17.94codesota-api
LLaMA-Adapter V2demon-benchstorytelling17.57codesota-api
MiniGPT-4demon-benchstorytelling17.07codesota-api
OpenFlamingodemon-benchmultimodal-dialogue16.88codesota-api
MiniGPT-4demon-benchrelation-cloze16.6codesota-api
mPLUG-Owldemon-benchrelation-cloze16.25codesota-api
Otterdemon-benchrelation-cloze16codesota-api
LLaVAdemon-benchrelation-cloze15.85codesota-api
Otterdemon-benchstorytelling15.57codesota-api
Otterdemon-benchmultimodal-dialogue15.37codesota-api
LLaMA-Adapter V2demon-benchmultimodal-dialogue14.22codesota-api
OpenFlamingodemon-benchvisual-inference13.85codesota-api
MiniGPT-4demon-benchmultimodal-dialogue13.69codesota-api
LLaMA-Adapter V2demon-benchvisual-inference13.51codesota-api
mPLUG-Owldemon-benchmultimodal-dialogue12.67codesota-api
InstructBLIPdemon-benchvisual-inference11.49codesota-api
Otterdemon-benchvisual-inference11.39codesota-api
LLaVAdemon-benchstorytelling10.7codesota-api
BLIP-2demon-benchvisual-inference10.67codesota-api
LLaVAdemon-benchvisual-inference8.27codesota-api
MiniGPT-4demon-benchvisual-inference7.95codesota-api
LLaVAdemon-benchmultimodal-dialogue7.79codesota-api
mPLUG-Owldemon-benchvisual-inference5.4codesota-api
InterFusercarla-leaderboarddriving_score76.18codesota-api
TCPcarla-leaderboarddriving_score75.14codesota-api
Think2Drivecarla-leaderboarddriving_score46codesota-api
mineru-2.5omnidocbenchlayout-map97.5codesota-api
GLM-OCRomnidocbenchcomposite94.62codesota-api
PaddleOCR-VL-1.5omnidocbenchcomposite94.5codesota-api
paddleocr-vlomnidocbenchtable-teds93.52codesota-api
Qianfan-OCRomnidocbenchcomposite93.12codesota-api
paddleocr-vlomnidocbenchcomposite92.86codesota-api
paddleocr-vl-0.9bomnidocbenchcomposite92.56codesota-api
Qianfan-OCRomnidocbenchformula-cdm92.43codesota-api
mistral-ocr-3omnidocbenchreading-order91.63codesota-api
Qianfan-OCRomnidocbenchtable-teds91.02codesota-api
mineru-2.5omnidocbenchcomposite90.67codesota-api
Gemini 3 Proomnidocbenchcomposite90.33codesota-api
Dolphin-v2omnidocbenchcomposite89.78codesota-api
qwen3-vl-235bomnidocbenchcomposite89.15codesota-api
monkeyocr-pro-3bomnidocbenchcomposite88.85codesota-api
ocrverse-4bomnidocbenchcomposite88.56codesota-api
dots-ocr-3bomnidocbenchcomposite88.41codesota-api
gemini-25-proomnidocbenchcomposite88.03codesota-api
MonkeyOCR-3Bomnidocbenchcomposite87.13codesota-api
qwen25-vlomnidocbenchcomposite87.02codesota-api
MonkeyOCR-pro-1.2Bomnidocbenchcomposite86.96codesota-api
PP-StructureV3omnidocbenchcomposite86.73codesota-api
DeepSeek-OCRomnidocbenchcomposite86.46codesota-api
clearocr-teamquestomnidocbenchreading-order86.04codesota-api
Nanonets-OCR-somnidocbenchcomposite85.59codesota-api
MinerU2-VLMomnidocbenchcomposite85.56codesota-api
Dolphin-1.5omnidocbenchcomposite85.06codesota-api
InternVL3.5-241Bomnidocbenchcomposite82.67codesota-api
olmOCR-7Bomnidocbenchcomposite81.79codesota-api
POINTS-Readeromnidocbenchcomposite80.98codesota-api
InternVL3-76Bomnidocbenchcomposite80.33codesota-api
mistral-ocr-3omnidocbenchcomposite79.75codesota-api
mistral-ocr-2512omnidocbenchcomposite79.75codesota-api
MinerU2-pipelineomnidocbenchcomposite75.51codesota-api
GPT-4oomnidocbenchcomposite75.02codesota-api
OCRFlux-3Bomnidocbenchcomposite74.82codesota-api
Dolphinomnidocbenchcomposite74.67codesota-api
Marker 1.8.2omnidocbenchcomposite71.3codesota-api
mistral-ocr-3omnidocbenchtable-teds70.88codesota-api
clearocr-teamquestomnidocbenchcomposite31.7codesota-api
clearocr-teamquestomnidocbenchformula-edit-distance0.902codesota-api
clearocr-teamquestomnidocbenchtable-teds0.8codesota-api
mistral-ocr-3omnidocbenchformula-edit-distance0.218codesota-api
clearocr-teamquestomnidocbenchtext-edit-distance0.154codesota-api
mistral-ocr-3omnidocbenchtext-edit-distance0.099codesota-api
Qianfan-OCRomnidocbenchtext-edit0.041codesota-api
gpt-4oomnidocbenchocr-edit-distance0.02codesota-api
RVT-2rlbenchsuccess-rate81.4codesota-api
RVTrlbenchsuccess-rate62.9codesota-api
PerActrlbenchsuccess-rate43.4codesota-api
OVRL-V2habitat-objectnav-hm3dsuccess_rate64.7codesota-api
Habitat-Webhabitat-objectnav-hm3dsuccess_rate35.4codesota-api
WavLLMaudiobenchavg-score50.25codesota-api
SALMONNaudiobenchavg-score43.99codesota-api
Qwen2-Audio-Instructaudiobenchavg-score42.12codesota-api
Whisper+LLaMA-3 (cascade)audiobenchavg-score40.9codesota-api
Qwen-Audio-Chataudiobenchavg-score38.59codesota-api
BEATsaudiosetmap0.506codesota-api
ASTaudiosetmap0.485codesota-api
HTS-ATaudiosetmap0.471codesota-api
CLAPaudiosetmap0.428codesota-api
TD3mujocoaverage-return5592codesota-api
SACmujocoaverage-return5179codesota-api
PPOmujocoaverage-return2038codesota-api
TD-MPC2 (317M params)mujocoaverage-return960codesota-api
TD-MPC2 (19M params)mujocoaverage-return953codesota-api
FOWMmujocoaverage-return945codesota-api
BROmujocoaverage-return941codesota-api
TD-MPC2 (5M params)mujocoaverage-return929codesota-api
DreamerV3mujocoaverage-return897codesota-api
TD-MPCmujocoaverage-return857codesota-api
DrQ-v2mujocoaverage-return799codesota-api
SAC (state-based)mujocoaverage-return777codesota-api
go-exploreatari-2600human-normalized-score40000codesota-api
agent57atari-2600human-normalized-score4731.3codesota-api
MEMEatari-2600human-normalized-score4087codesota-api
bbos-1atari-2600human-normalized-score1100codesota-api
gdi-h3atari-2600human-normalized-score950codesota-api
dreamerv3atari-2600human-normalized-score840codesota-api
muzeroatari-2600human-normalized-score731codesota-api
EfficientZero V2atari-2600human-normalized-score242.8codesota-api
rainbow-dqnatari-2600human-normalized-score231codesota-api
BBF (Bigger, Better, Faster)atari-2600human-normalized-score224.7codesota-api
DIAMONDatari-2600human-normalized-score145.9codesota-api
STORMatari-2600human-normalized-score126.7codesota-api
Simulusatari-2600human-normalized-score110codesota-api
DARTatari-2600human-normalized-score102.2codesota-api
human-gameratari-2600human-normalized-score100codesota-api
dqnatari-2600human-normalized-score79codesota-api
SegFormer-B5cityscapesmiou84codesota-api
Mask2Former (Swin-L)cityscapesmiou83.3codesota-api
OneFormer (DiNAT-L)cityscapesmiou83codesota-api
Qwen2-VL 72Bvqa-v2accuracy87.6codesota-api
InternVL2-76Bvqa-v2accuracy87.2codesota-api
Gemini 1.5 Provqa-v2accuracy86.5codesota-api
PaLI-X 55Bvqa-v2accuracy86.1codesota-api
NVLM-D 1.0 72Bvqa-v2accuracy85.4codesota-api
NVLM-X 1.0 72Bvqa-v2accuracy85.2codesota-api
NVLM-H 1.0 72Bvqa-v2accuracy85.2codesota-api
VILA-1.5 40Bvqa-v2accuracy84.3codesota-api
LLaVA-NeXT 34Bvqa-v2accuracy83.7codesota-api
LLaVA-NeXT 13Bvqa-v2accuracy82.8codesota-api
CogVLM-17Bvqa-v2accuracy82.3codesota-api
LLaVA-NeXT 7B (Mistral)vqa-v2accuracy82.2codesota-api
BLIP-2vqa-v2accuracy82.19codesota-api
LLaVA-NeXT 7B (Vicuna)vqa-v2accuracy81.8codesota-api
Pixtral Largevqa-v2accuracy80.9codesota-api
Llama 3-V 405Bvqa-v2accuracy80.2codesota-api
LLaVA-1.5 13Bvqa-v2accuracy80codesota-api
LLaVA-1.5vqa-v2accuracy80codesota-api
Llama 3-V 70Bvqa-v2accuracy79.1codesota-api
Pixtral-12Bvqa-v2accuracy78.6codesota-api
GPT-4ovqa-v2accuracy78.5codesota-api
Llama 3.2 90B Vision Instructvqa-v2accuracy78.1codesota-api
GPT-4Vvqa-v2accuracy77.2codesota-api
chandra-ocr-0.1.0olmocr-benchbase99.9codesota-api
olmocr-v0.4.0olmocr-benchbase99.7codesota-api
LightOnOCR-2-1Bolmocr-benchbase99.6codesota-api
Qianfan-OCRolmocr-benchbase99.6codesota-api
olmocr-v0.4.0olmocr-benchheaders-footers96.1codesota-api
olmocr-v0.3.0olmocr-benchheaders-footers95.1codesota-api
chandra-ocr-0.1.0olmocr-benchlong-tiny-text92.3codesota-api
Qianfan-OCRolmocr-benchmulti-column92.2codesota-api
LightOnOCR-2-1Bolmocr-benchlong-tiny-text91.4codesota-api
chandra-ocr-0.1.0olmocr-benchheaders-footers90.8codesota-api
LightOnOCR-2-1Bolmocr-bencharxiv89.6codesota-api
LightOnOCR-2-1Bolmocr-benchtables89codesota-api
dots-ocr-3bolmocr-benchtables88.3codesota-api
chandra-ocr-0.1.0olmocr-benchtables88codesota-api
LightOnOCR-2-1Bolmocr-benchold-scans-math85.6codesota-api
olmocr-v0.4.0olmocr-benchtables84.9codesota-api
LightOnOCR-2-1Bolmocr-benchmulti-column84.8codesota-api
dots.mocrolmocr-benchpass-rate83.9codesota-api
marker-1.10.0olmocr-bencharxiv83.8codesota-api
olmocr-v0.4.0olmocr-benchmulti-column83.7codesota-api
LightOnOCR-2-1Bolmocr-benchpass-rate83.2codesota-api
chandra-ocr-0.1.0olmocr-benchpass-rate83.1codesota-api
olmocr-v0.4.0olmocr-bencharxiv83codesota-api
infinity-parser-7bolmocr-benchpass-rate82.5codesota-api
olmocr-v0.4.0olmocr-benchpass-rate82.4codesota-api
olmocr-v0.4.0olmocr-benchold-scans-math82.3codesota-api
chandra-ocr-0.1.0olmocr-bencharxiv82.2codesota-api
olmocr-v0.4.0olmocr-benchlong-tiny-text81.9codesota-api
Qianfan-OCRolmocr-benchtables81.6codesota-api
chandra-ocr-0.1.0olmocr-benchmulti-column81.2codesota-api
Qianfan-OCRolmocr-benchlong-tiny-text80.4codesota-api
chandra-ocr-0.1.0olmocr-benchold-scans-math80.3codesota-api
Qianfan-OCRolmocr-bencharxiv80.1codesota-api
paddleocr-vlolmocr-benchpass-rate80codesota-api
olmocr-v0.3.0olmocr-benchold-scans-math79.9codesota-api
Qianfan-OCRolmocr-benchpass-rate79.8codesota-api
Qwen3-VL-4Bolmocr-benchpass-rate79.2codesota-api
PaddleOCR-VL-1.5olmocr-benchpass-rate79.1codesota-api
dots-ocr-3bolmocr-benchpass-rate79.1codesota-api
mistral-ocr-3olmocr-benchpass-rate78codesota-api
marker-1.10.0olmocr-benchpass-rate76.5codesota-api
marker-1.10.1olmocr-benchpass-rate76.1codesota-api
MonkeyOCR-pro-3Bolmocr-benchpass-rate75.8codesota-api
deepseek-ocrolmocr-benchpass-rate75.7codesota-api
mineru-2.5olmocr-benchpass-rate75.2codesota-api
Qianfan-OCRolmocr-benchold-scans73.1codesota-api
mistral-ocr-apiolmocr-benchpass-rate72codesota-api
gpt-4o-anchoredolmocr-benchpass-rate69.9codesota-api
nanonets-ocr2-3bolmocr-benchpass-rate69.5codesota-api
gemini-flash-2olmocr-benchpass-rate63.8codesota-api
chandra-ocr-0.1.0olmocr-benchold-scans50.4codesota-api
olmocr-v0.4.0olmocr-benchold-scans47.7codesota-api
LightOnOCR-2-1Bolmocr-benchold-scans42.2codesota-api
Qianfan-OCRolmocr-benchheaders-footers42codesota-api
gpt-4oolmocr-benchold-scans40.7codesota-api
Med-Geminimedqa-usmleAccuracy91.1codesota-api
Med-PaLM 2medqa-usmleAccuracy86.5codesota-api
GPT-4 (base)medqa-usmleAccuracy86.1codesota-api
LayoutLMv3-largefunsdf192.08codesota-api
UDOPfunsdf191.62codesota-api
LayoutLMv3-basefunsdf190.29codesota-api
DocFormerv2-largefunsdf188.89codesota-api
LiLT[EN-R2]-basefunsdf188.41codesota-api
DocFormerv2-basefunsdf188.37codesota-api
StructuralLMfunsdf185.14codesota-api
FormNetfunsdf184.69codesota-api
BROS-largefunsdf184.52codesota-api
LayoutLMv2-largefunsdf184.2codesota-api
LayoutLMv2-basefunsdf182.76codesota-api
LayoutLMv1-basefunsdf179.27codesota-api
LayoutLMv1-largefunsdf177.89codesota-api
DeepSeek-R1-0528livecodebenchpass@173.3codesota-api
Qwen3-235B-A22Blivecodebenchpass@170.7codesota-api
DeepSeek-R1livecodebenchpass@165.9codesota-api
DeepSeek-R1-Distill-Llama-70Blivecodebenchpass@165.2codesota-api
OpenAI o1 (Dec 2024)livecodebenchpass@163.4codesota-api
Kimi k1.5 (long-CoT)livecodebenchpass@162.5codesota-api
DeepSeek-R1-Distill-Qwen-32Blivecodebenchpass@162.1codesota-api
DeepSeek-R1-Distill-Qwen-14Blivecodebenchpass@159.1codesota-api
o1-minilivecodebenchpass@153.8codesota-api
DeepSeek-V3-0324livecodebenchpass@149.2codesota-api
DeepSeek-R1-Distill-Qwen-7Blivecodebenchpass@149.1codesota-api
DeepSeek-R1-Distill-Llama-8Blivecodebenchpass@149codesota-api
Kimi k1.5 (short-CoT)livecodebenchpass@147.3codesota-api
Llama 4 Maverick (17B-128E)livecodebenchpass@143.4codesota-api
DeepSeek-V3livecodebenchpass@140.5codesota-api
Gemma 3 27B ITlivecodebenchpass@139codesota-api
Claude 3.5 Sonnetlivecodebenchpass@138.9codesota-api
GPT-4olivecodebenchpass@132.9codesota-api
Llama 4 Scout (17B-16E)livecodebenchpass@132.8codesota-api
Gemma 3 12B ITlivecodebenchpass@132codesota-api
Qwen2.5-Coder-32B-Instructlivecodebenchpass@131.4codesota-api
Gemma 3 4B ITlivecodebenchpass@123codesota-api
o4-mini (high)humanevalpass@199.3codesota-api
o3-mini (high)humanevalpass@197.6codesota-api
o4-minihumanevalpass@197.3codesota-api
o3-minihumanevalpass@196.3codesota-api
gpt-41humanevalpass@194.5codesota-api
GPT-4.1 minihumanevalpass@193.8codesota-api
Qwen2.5-Coder-32B-Instructhumanevalpass@192.7codesota-api
o1-previewhumanevalpass@192.4codesota-api
o1-minihumanevalpass@192.4codesota-api
Claude 3.5 Sonnet (Oct 2024)humanevalpass@192.1codesota-api
claude-35-sonnethumanevalpass@192codesota-api
gpt-4ohumanevalpass@191codesota-api
GPT-4o (Nov 2024)humanevalpass@190.2codesota-api
llama-31-405bhumanevalpass@189codesota-api
gpt-45-previewhumanevalpass@188.6codesota-api
grok-2humanevalpass@188.4codesota-api
Qwen2.5-Coder-7B-Instructhumanevalpass@188.4codesota-api
o3 (high)humanevalpass@188.4codesota-api
gpt-4-turbohumanevalpass@188.2codesota-api
Gemma 3 27B IThumanevalpass@187.8codesota-api
o3humanevalpass@187.4codesota-api
gpt-4o-minihumanevalpass@187.2codesota-api
GPT-4.1 nanohumanevalpass@187codesota-api
Gemma 3 12B IThumanevalpass@185.4codesota-api
DeepSeek-Coder-V2-Instructhumanevalpass@185.4codesota-api
claude-3-opushumanevalpass@184.9codesota-api
Phi-4 (14B)humanevalpass@182.6codesota-api
deepseek-v3humanevalpass@182.6codesota-api
llama-3-70bhumanevalpass@181.7codesota-api
llama-31-70bhumanevalpass@180.5codesota-api
gemini-15-prohumanevalpass@171.9codesota-api
Gemma 3 4B IThumanevalpass@171.3codesota-api
DeepSeek-V3humanevalpass@165.2codesota-api
DINOv2 ViT-g/14imagenet-linear-probetop1_accuracy86.5codesota-api
DINOv2 ViT-g/14imagenet-linear-probetop-1-accuracy86.5codesota-api
DINOv2 ViT-L/14imagenet-linear-probetop-1-accuracy86.3codesota-api
CLIP ViT-L/14imagenet-linear-probetop-1-accuracy85.3codesota-api
SimCLRv2 (ResNet-152 3x)imagenet-linear-probetop1_accuracy79.8codesota-api
MAE ViT-H/14imagenet-linear-probetop-1-accuracy77.2codesota-api
MAE ViT-H/14imagenet-linear-probetop1_accuracy76.6codesota-api
MAE ViT-L/16imagenet-linear-probetop-1-accuracy76codesota-api
Nova 2wildasrcer10.1codesota-api
Qwen2-Audiowildasrcer9.1codesota-api
Scribe V1wildasrcer8.7codesota-api
Whisper Large V3wildasrcer7.5codesota-api
Gemini 2.5 Prowildasrcer6.7codesota-api
GPT-4o Transcribewildasrcer6.4codesota-api
Gemini 3 Prowildasrcer6.1codesota-api
Nova 2wildasrwer6codesota-api
Qwen2-Audiowildasrwer5.8codesota-api
Whisper Large V3wildasrwer4.2codesota-api
Gemini 2.5 Prowildasrwer3.6codesota-api
Scribe V1wildasrwer3.6codesota-api
Gemini 3 Prowildasrwer2.8codesota-api
GPT-4o Transcribewildasrwer2.8codesota-api
AutoAttack vs Undefended ResNetrobustbench-cifar10-linf-attackAttack Success Rate100codesota-api
AutoAttack vs Wang 2023robustbench-cifar10-linf-attackAttack Success Rate29.3codesota-api
AutoAttack vs Peng 2023robustbench-cifar10-linf-attackAttack Success Rate28.8codesota-api
TAPE + RevGATcoraaccuracy92.9codesota-api
AuGLM (T5-large)coraaccuracy91.51codesota-api
ENGINEcoraaccuracy91.48codesota-api
InstructGLMcoraaccuracy90.77codesota-api
GLEM + RevGATcoraaccuracy88.56codesota-api
GCNLLMEmbcoraaccuracy88.15codesota-api
LLaGA (Mistral-7B)coraaccuracy87.55codesota-api
SDGATcoraaccuracy85.29codesota-api
GCN* (tuned)coraaccuracy85.08codesota-api
GAT* (tuned)coraaccuracy84.64codesota-api
SGFormercoraaccuracy84.5codesota-api
GraphSAGE* (tuned)coraaccuracy84.18codesota-api
Polynormercoraaccuracy83.25codesota-api
GOATcoraaccuracy83.18codesota-api
GATcoraaccuracy83codesota-api
GraphGPScoraaccuracy82.84codesota-api
Exphormercoraaccuracy82.77codesota-api
GraphSAGEcoraaccuracy82.68codesota-api
NodeFormercoraaccuracy82.2codesota-api
NAGphormercoraaccuracy82.12codesota-api
GCNcoraaccuracy81.5codesota-api
ViTPose-Hcoco-keypointsap80.9codesota-api
RTMPose-Xcoco-keypointsap78.8codesota-api
HRNet-W48coco-keypointsap75.5codesota-api
GROVER-Largemoleculenet-bbbpROC-AUC0.94codesota-api
D-MPNN (ChemProp)moleculenet-bbbpROC-AUC0.913codesota-api
MolCLRmoleculenet-bbbpROC-AUC0.736codesota-api
ZoeDepth-Nnyu-depth-v2absrel0.075codesota-api
Marigoldnyu-depth-v2absrel0.055codesota-api
MiDaS 3.1 (BEiT-512)nyu-depth-v2absrel0.048codesota-api
Depth Anything V1 (ViT-L)nyu-depth-v2absrel0.045codesota-api
Depth Anything V2 (ViT-L)nyu-depth-v2absrel0.041codesota-api
Megatron-BERTraceaccuracy90.9codesota-api
ALBERT (Ensemble)raceaccuracy89.4codesota-api
GPT-4xnliaccuracy87.4codesota-api
XLM-RoBERTa-largexnliaccuracy83.6codesota-api
mDeBERTa-v3-basexnliaccuracy80.8codesota-api
Puigcerverrimeswer9.9codesota-api
GatedHTRrimeswer8.7codesota-api
Puigcerverrimescer3.21codesota-api
VANrimescer1.91codesota-api
GatedHTRrimescer1.81codesota-api
Stable Audio Openaudiocaps-t2afad2.57codesota-api
AudioGen Mediumaudiocaps-t2afad1.82codesota-api
AudioLDM 2audiocaps-t2afad1.42codesota-api
AudioLDMaudiocapsfad4.48codesota-api
AudioLDM 2-Full-Largeaudiocapsfad1.86codesota-api
AudioLDM 2-Fullaudiocapsfad1.78codesota-api
TANGOaudiocapsfad1.73codesota-api
AudioLDM 2-AC-Largeaudiocapsfad1.42codesota-api
EVA-CLIP-18Bimagenet-zero-shottop-183.8codesota-api
SigLIP-SO400Mimagenet-zero-shottop-183.2codesota-api
OpenCLIP ViT-G/14imagenet-zero-shottop-180.1codesota-api
CLIP ViT-L/14imagenet-zero-shottop-175.5codesota-api
Diffusion-QLd4rl-halfcheetah-mediumnormalized_return51.1codesota-api
IQL (Implicit Q-Learning)d4rl-halfcheetah-mediumnormalized_return47.4codesota-api
CQL (Conservative Q-Learning)d4rl-halfcheetah-mediumnormalized_return44codesota-api
π0 (Pi-Zero)libero-longsuccess_rate85.2codesota-api
OpenVLAlibero-longsuccess_rate53.7codesota-api
Octo-Baselibero-longsuccess_rate51.1codesota-api
Qwen2.5-Coder-32Bmbpp-pluspass@176.4codesota-api
DeepSeek-V3mbpp-pluspass@173codesota-api
GPT-4ombpp-pluspass@171.2codesota-api
DeepSeek-Coder-33Bmbpp-pluspass@166codesota-api
Marigoldkitti-depthabsrel0.099codesota-api
MiDaS 3.1 (BEiT-512)kitti-depthabsrel0.058codesota-api
ZoeDepth-Kkitti-depthabsrel0.053codesota-api
Depth Anything V1 (ViT-L)kitti-depthabsrel0.046codesota-api
Depth Anything V2 (ViT-L)kitti-depthabsrel0.04codesota-api
SegNet (class-level)dagm-2007Accuracy100codesota-api
ResNet baselinedagm-2007Accuracy99.8codesota-api
VANiamwer16.3codesota-api
HTR-VTiamwer14.9codesota-api
HTR-ConvTextiamwer12.9codesota-api
VANiamcer5codesota-api
HTR-VTiamcer4.7codesota-api
HTR-ConvTextiamcer4codesota-api
TrOCR-baseiamcer3.42codesota-api
TrOCR-largeiamcer2.89codesota-api
Qwen2.5-Coder-32Bhumaneval-pluspass@187.2codesota-api
DeepSeek-V3humaneval-pluspass@186.6codesota-api
GPT-4ohumaneval-pluspass@186codesota-api
DeepSeek-Coder-V2humaneval-pluspass@182.3codesota-api
DeepSeek-Coder-33Bhumaneval-pluspass@175codesota-api
HIVE-COTE 2.0ucr-archivemean_accuracy88.6codesota-api
Hydra + MultiRocketucr-archivemean_accuracy88.3codesota-api
InceptionTimeucr-archivemean_accuracy85codesota-api
InternVideo2kinetics-400top-192.1codesota-api
VideoMAE V2 (ViT-g)kinetics-400top-190codesota-api
ViViT-Hkinetics-400top-184.9codesota-api
TimeSformer-Lkinetics-400top-180.7codesota-api
GPT-4wmt23comet84.1codesota-api
Google Translatewmt23comet83.8codesota-api
DeepLwmt23comet83.5codesota-api
NLLB-3.3Bwmt23comet81.6codesota-api
DINOv2 ViT-g/14imagenet-knntop1_accuracy83.5codesota-api
DINOv2 ViT-L/14imagenet-knntop1_accuracy83.5codesota-api
DINO ViT-B/16imagenet-knntop1_accuracy76.1codesota-api
PaLI-X-55Bok-vqaaccuracy66.1codesota-api
PaLI-17Bok-vqaaccuracy64.5codesota-api
GPT-4Vok-vqaaccuracy64.28codesota-api
Flamingo-80Bok-vqaaccuracy57.8codesota-api
BLIP-2 (FlanT5XXL)ok-vqaaccuracy44.7codesota-api
EVA-02-Lcifar-100accuracy97.15codesota-api
CoAtNet-7cifar-100accuracy96.38codesota-api
ConvNeXt V2-Hcifar-100accuracy96.17codesota-api
MAE ViT-H/14cifar-100accuracy96.08codesota-api
SwinV2-Gcifar-100accuracy96.01codesota-api
DeiT III-H/14cifar-100accuracy95.94codesota-api
InternImage-XLcifar-100accuracy95.77codesota-api
FasterViT-6cifar-100accuracy95.72codesota-api
vit-h-14cifar-100accuracy94.55codesota-api
AIMv2-3Bcifar-100accuracy94.5codesota-api
AIMv2-1Bcifar-100accuracy94.1codesota-api
ViT-L/16 (IN-21K)cifar-100accuracy93.25codesota-api
efficientnet-b7cifar-100accuracy91.7codesota-api
vit-b-16cifar-100accuracy91.48codesota-api
resnet-50cifar-100accuracy78.04codesota-api
coca-finetunedimagenet-1ktop-1-accuracy91codesota-api
vit-g-14imagenet-1ktop-1-accuracy90.45codesota-api
EVA-02-Limagenet-1ktop-1-accuracy90.056codesota-api
EVA-Giantimagenet-1ktop-1-accuracy89.79codesota-api
InternImage-Himagenet-1ktop-1-accuracy89.6codesota-api
SigLIP-SO400Mimagenet-1ktop-1-accuracy89.41codesota-api
convnext-v2-hugeimagenet-1ktop-1-accuracy88.9codesota-api
ViT-H/14 CLIP (LAION-2B)imagenet-1ktop-1-accuracy88.634codesota-api
ConvNeXt-XXLarge (CLIP LAION)imagenet-1ktop-1-accuracy88.622codesota-api
vit-h-14imagenet-1ktop-1-accuracy88.55codesota-api
swin-largeimagenet-1ktop-1-accuracy87.3codesota-api
efficientnet-v2-limagenet-1ktop-1-accuracy85.7codesota-api
deit-b-distilledimagenet-1ktop-1-accuracy85.2codesota-api
efficientnet-b7imagenet-1ktop-1-accuracy84.4codesota-api
deit-bimagenet-1ktop-1-accuracy83.1codesota-api
convnext-v2-tinyimagenet-1ktop-1-accuracy83codesota-api
vit-l-16imagenet-1ktop-1-accuracy82.7codesota-api
vit-b-16imagenet-1ktop-1-accuracy81.2codesota-api
resnet-50-a3imagenet-1ktop-1-accuracy80.4codesota-api
resnet-152imagenet-1ktop-1-accuracy78.6codesota-api
efficientnet-b0imagenet-1ktop-1-accuracy77.1codesota-api
resnet-50imagenet-1ktop-1-accuracy76.15codesota-api
MusicGen-Mediummusiccapsfad4.89codesota-api
AudioLDM 2-MSDmusiccapsfad4.47codesota-api
MusicLMmusiccapsfad4codesota-api
AudioLDM-Mmusiccapsfad3.2codesota-api
AudioLDM 2-Fullmusiccapsfad3.13codesota-api
SAM 2 (Hiera-L)sa-1bmiou62.2codesota-api
SAM (ViT-H)sa-1bmiou58.1codesota-api
FastSAMsa-1bmiou57.1codesota-api
EfficientSAMsa-1bmiou55.5codesota-api
CogVLM-17Bnocapscider128.3codesota-api
PaLI-X-55Bnocapscider126.3codesota-api
PaLI-17Bnocapscider124.4codesota-api
BLIP-2 (FlanT5XL)nocapscider123.7codesota-api
BLIP-2 (OPT 2.7B)nocapscider121.6codesota-api
BEATsesc-50accuracy98.1codesota-api
HTS-ATesc-50accuracy97codesota-api
ASTesc-50accuracy95.6codesota-api
CLAPesc-50accuracy93.7codesota-api
GPT-4wikitablequestionsaccuracy75.3codesota-api
Claude 3.5 Sonnetwikitablequestionsaccuracy73codesota-api
TAPAS-largewikitablequestionsaccuracy48.7codesota-api
ViT-H/14 (JFT-300M)cifar-10accuracy99.5codesota-api
ViT-L/16 (JFT-300M)cifar-10accuracy99.42codesota-api
BiT-L (ResNet152x4)cifar-10accuracy99.37codesota-api
ViT-H/14 (IN-21K)cifar-10accuracy99.27codesota-api
deit-b-distilledcifar-10accuracy99.1codesota-api
ViT-L/16 (IN-21K)cifar-10accuracy99codesota-api
EfficientNet-B8 (NoisyStudent)cifar-10accuracy98.7codesota-api
convnext-v2-basecifar-10accuracy98.7codesota-api
ViT-B/16 (IN-21K)cifar-10accuracy98.13codesota-api
Swin-Bcifar-10accuracy98codesota-api
resnet-50cifar-10accuracy96.01codesota-api
SLCA (ViT-B/16)split-cifar100average_accuracy91.53codesota-api
DualPrompt (ViT-B/16)split-cifar100average_accuracy86.51codesota-api
L2P (ViT-B/16)split-cifar100average_accuracy83.86codesota-api
Claude 3.5 Sonnet (Oct 2024)mbpppass@191codesota-api
Qwen2.5-Coder-32B-Instructmbpppass@190.2codesota-api
DeepSeek-Coder-V2-Instructmbpppass@189.4codesota-api
claude-35-sonnetmbpppass@189.2codesota-api
gpt-4ombpppass@187.8codesota-api
GPT-4o (Aug 2024)mbpppass@186.8codesota-api
Qwen2.5-Coder-7B-Instructmbpppass@183.5codesota-api
Codestral 22B v0.1mbpppass@178.2codesota-api
Llama 4 Maverick (17B-128E)mbpppass@177.6codesota-api
DeepSeek-V3mbpppass@175.4codesota-api
Gemma 3 27B ITmbpppass@174.4codesota-api
Gemma 3 12B ITmbpppass@173codesota-api
Llama 4 Scout (17B-16E)mbpppass@167.8codesota-api
Gemma 3 4B ITmbpppass@163.2codesota-api
GPT-4 + AlphaCodiumcodecontestspass@144codesota-api
AlphaCode 2codecontestspass@143codesota-api
GPT-4codecontestspass@119codesota-api
P>M>F (ViT-B, DINO pretrained)mini-imagenet-5way5shotaccuracy95.3codesota-api
FEAT (ResNet-12)mini-imagenet-5way5shotaccuracy82.05codesota-api
SSF (ViT-B/16)vtab-1kmean_accuracy73.1codesota-api
VPT-Deep (ViT-B/16)vtab-1kmean_accuracy72codesota-api
DeBERTa-v3-largeglue-fill-maskavg-score91.37codesota-api
ALBERT-xxlarge-v2glue-fill-maskavg-score89.4codesota-api
RoBERTa-largeglue-fill-maskavg-score88.5codesota-api
SimpleNetmvtec-adImage AUROC99.6codesota-api
simplenetmvtec-adauroc99.6codesota-api
fastflowmvtec-adauroc99.4codesota-api
patchcoremvtec-adauroc99.1codesota-api
efficientadmvtec-adauroc99.1codesota-api
PatchCoremvtec-adImage AUROC99.1codesota-api
EfficientADmvtec-adImage AUROC99.1codesota-api
reverse-distillationmvtec-adauroc98.5codesota-api
cflow-admvtec-adauroc98.3codesota-api
draemmvtec-adauroc98codesota-api
padimmvtec-adauroc97.9codesota-api
SVM + hand-crafted featuresgdxray-weldsAccuracy95.2codesota-api
ResNet50 CNNgdxray-weldsAccuracy90.26codesota-api
LlamaParse Agenticparsebenchaccuracy84.9codesota-api
LlamaParse Cost Effectiveparsebenchaccuracy71.9codesota-api
Google Gemini 3 Flashparsebenchaccuracy71codesota-api
Reductoparsebenchaccuracy67.8codesota-api
Qwen 3 VLparsebenchaccuracy62codesota-api
Azure Document Intelligenceparsebenchaccuracy59.6codesota-api
Extendparsebenchaccuracy55.8codesota-api
Dots OCR 1.5parsebenchaccuracy55.8codesota-api
Doclingparsebenchaccuracy50.6codesota-api
Google Cloud Document AIparsebenchaccuracy50.4codesota-api
AWS Textractparsebenchaccuracy47.9codesota-api
OpenAI GPT-5 Miniparsebenchaccuracy46.8codesota-api
LandingAIparsebenchaccuracy45.2codesota-api
Anthropic Haiku 4.5parsebenchaccuracy45.2codesota-api
swin-v2-largeimagenet-v2top-1-accuracy84microsoft-research
convnext-v2-hugeimagenet-v2top-1-accuracy80.5meta-research
patchcorevisaauroc92.1research-paper
simplenetvisaauroc95.5research-paper
efficientadvisaauroc94.8research-paper
o3 (high)mathaccuracy98.1src
o4-mini (high)mathaccuracy98.2src
o3-minimathaccuracy97.9src
o3mathaccuracy97.8src
o4-minimathaccuracy97.5src
DeepSeek-R1mathaccuracy97.3src
Gemini 2.5 Promathaccuracy97.3src
o1mathaccuracy96.4src
Claude 3.7 Sonnetmathaccuracy96.2src
Kimi k1.5mathaccuracy96.2src
DeepSeek-R1-Zeromathaccuracy95.9src
DeepSeek-R1-Distill-Llama-70Bmathaccuracy94.5src
DeepSeek-R1-Distill-Qwen-32Bmathaccuracy94.3src
DeepSeek-V3-0324mathaccuracy94src
QwQ-32Bmathaccuracy90.6src
deepseek-v3mathaccuracy90.2src
o1-minimathaccuracy90src
GPT-4.5 Previewmathaccuracy87.1src
o1-previewmathaccuracy85.5src
GPT-4.1mathaccuracy82.1src
gpt-4omathaccuracy76.6src
Grok 2mathaccuracy76.1src
Llama 3.1 405Bmathaccuracy73.8src
GPT-4 Turbomathaccuracy73.4src
claude-35-sonnetmathaccuracy71.1src
gpt-4o-minimathaccuracy70.2src
Llama 3.1 70Bmathaccuracy68src
gemini-15-promathaccuracy67.7src
Claude 3 Opusmathaccuracy60.1src
U-Net Ensemble (Pavlov)severstal-steelDice0.903src
2nd Place Solutionseverstal-steelDice0.9084src
bestfitting (1st place ensemble)severstal-steelDice0.90883src
o1-previewgsm8kaccuracy97.8src
claude-35-sonnetgsm8kaccuracy96.4src
llama-3-70bgsm8kaccuracy93src
gpt-4ogsm8kaccuracy92src
gemini-15-progsm8kaccuracy91.7src
o3mmluaccuracy92.9src
o1mmluaccuracy91.8src
gpt-45-previewmmluaccuracy90.8src
o1-previewmmluaccuracy90.8src
gpt-41mmluaccuracy90.2src
o4-minimmluaccuracy90src
llama-31-405bmmluaccuracy88.6src
deepseek-v3mmluaccuracy88.5src
claude-35-sonnetmmluaccuracy88.3src
grok-2mmluaccuracy87.5src
gpt-4ommluaccuracy87.2src
claude-3-opusmmluaccuracy86.8src
gpt-4-turbommluaccuracy86.7src
gemini-15-prommluaccuracy85.9src
o3-minimmluaccuracy85.9src
o1-minimmluaccuracy85.2src
llama-31-70bmmluaccuracy82src
gpt-4o-minimmluaccuracy82src
llama-3-70bmmluaccuracy82src
o3gpqaaccuracy82.8src
o4-minigpqaaccuracy77.6src
o1gpqaaccuracy75.7src
o3-minigpqaaccuracy74.9src
o1-previewgpqaaccuracy73.3src
gpt-45-previewgpqaaccuracy69.5src
gpt-41gpqaaccuracy66.3src
o1-minigpqaaccuracy60src
claude-35-sonnetgpqaaccuracy59.4src
grok-2gpqaaccuracy56src
llama-31-405bgpqaaccuracy50.7src
claude-3-opusgpqaaccuracy50.4src
gpt-4ogpqaaccuracy49.9src
gpt-4-turbogpqaaccuracy49.3src
gemini-15-progpqaaccuracy46.2src
llama-31-70bgpqaaccuracy41.7src
gpt-4o-minigpqaaccuracy40.2src
o1-previewaime-2024accuracy83.3src
claude-35-opusaime-2024accuracy16src
gpt-4oaime-2024accuracy13.4src
Claude Opus 4.7swe-bench-verifiedresolve-rate87.6vendor
Claude Opus 4.5swe-bench-verifiedresolve-rate80.9src
Claude Opus 4.6swe-bench-verifiedresolve-rate80.8src
Gemini 3.1 Proswe-bench-verifiedresolve-rate80.6src
MiniMax M2.5swe-bench-verifiedresolve-rate80.2src
GPT-5.2 Thinkingswe-bench-verifiedresolve-rate80src
Claude Sonnet 4.6swe-bench-verifiedresolve-rate79.6src
Gemini 3 Flashswe-bench-verifiedresolve-rate78src
Claude Sonnet 4.5swe-bench-verifiedresolve-rate77.2src
Kimi K2.5swe-bench-verifiedresolve-rate76.8src
GPT-5.1swe-bench-verifiedresolve-rate76.3src
Gemini 3 Proswe-bench-verifiedresolve-rate76.2src
GPT-5swe-bench-verifiedresolve-rate74.9src
MiniMax M2.1swe-bench-verifiedresolve-rate74src
Claude Haiku 4.5swe-bench-verifiedresolve-rate73.3src
Claude Sonnet 4swe-bench-verifiedresolve-rate72.7src
Claude Opus 4swe-bench-verifiedresolve-rate72.5src
Devstral 2swe-bench-verifiedresolve-rate72.2src
Qwen3-Coder-480Bswe-bench-verifiedresolve-rate69.6src
MiniMax M2swe-bench-verifiedresolve-rate69.4src
o3swe-bench-verifiedresolve-rate69.1src
o4-miniswe-bench-verifiedresolve-rate68.1src
DeepSeek V3.1swe-bench-verifiedresolve-rate66src
Kimi K2swe-bench-verifiedresolve-rate65.8src
Grok 3swe-bench-verifiedresolve-rate63.8src
Gemini 2.5 Proswe-bench-verifiedresolve-rate63.8src
Claude 3.7 Sonnetswe-bench-verifiedresolve-rate63.7src
Gemini 2.5 Flashswe-bench-verifiedresolve-rate60.4src
DeepSeek R1-0528swe-bench-verifiedresolve-rate57.6src
o3-miniswe-bench-verifiedresolve-rate55.8src
GPT-4.1swe-bench-verifiedresolve-rate54.6src
Claude 3.5 Sonnetswe-bench-verifiedresolve-rate50.8src
DeepSeek-R1swe-bench-verifiedresolve-rate49.2src
o1swe-bench-verifiedresolve-rate48.9src
Devstral Small 2505swe-bench-verifiedresolve-rate46.8src
DeepSeek V3swe-bench-verifiedresolve-rate42src
GPT-4oswe-bench-verifiedresolve-rate41.2src
Claude 3.5 Haikuswe-bench-verifiedresolve-rate40.6src
DeepSeek V2.5swe-bench-verifiedresolve-rate37src
co-detr-swin-lcocomAP66src
internimage-hcocomAP65.4src
Focal-Stable-DINOcocomAP64.6src
dino-swin-lcocomAP63.3src
EVA-02-LcocomAP62.3src
RF-DETR-2XLcocomAP60.1src
D-FINE-X (Objects365)cocomAP59.3src
yolov10-xcocomAP57.4src
RT-DETRv4-XcocomAP57src
DINO-X PrococomAP56src
D-FINE-XcocomAP55.8src
YOLOv9-EcocomAP55.6src
efficientdet-d7-xcocomAP55.1src
YOLO11xcocomAP54.7src
RT-DETRv3-R101cocomAP54.6src
RT-DETRv2-XcocomAP54.3src
Grounding DINO 1.5 PrococomAP54.3src
gemini-15-procc-ocrmulti-scene-f183.25src
gemini-15-procc-ocrmultilingual-f178.97src
qwen2-vl-72bcc-ocrmulti-scene-f177.95src
internvl2-76bcc-ocrmulti-scene-f176.92src
gpt-4occ-ocrmulti-scene-f176.4src
gpt-4occ-ocrmultilingual-f173.44src
claude-35-sonnetcc-ocrmulti-scene-f172.87src
qwen2-vl-72bcc-ocrkie-f171.76src
gemini-15-procc-ocrkie-f167.28src
claude-35-sonnetcc-ocrkie-f164.58src
gpt-4occ-ocrkie-f163.45src
gemini-15-procc-ocrdocument-parsing62.37src
paddleocrkitab-benchcer0.79src
easyocrkitab-benchcer0.58src
tesseractkitab-benchcer0.54src
azure-ocrkitab-benchcer0.52src
gpt-4o-minikitab-benchcer0.43src
gpt-4okitab-benchcer0.31src
ain-7bkitab-benchcer0.2src
gemini-20-flashkitab-benchcer0.13src
claude-sonnet-4thaiocrbenchted-score0.84src
gemini-25-prothaiocrbenchted-score0.77src
qwen25-vl-32bthaiocrbenchted-score0.765src
internvl3-14bthaiocrbenchted-score0.76src
qwen25-vl-72bthaiocrbenchted-score0.72src
gemini-25-promme-videoocrtotal-accuracy73.7src
qwen25-vl-72bmme-videoocrtotal-accuracy69src
internvl3-78bmme-videoocrtotal-accuracy67.2src
gpt-4omme-videoocrtotal-accuracy66.4src
gemini-15-promme-videoocrtotal-accuracy64.9src
qwen25-vl-32bmme-videoocrtotal-accuracy61src
chexpert-auc-maximizerchexpertauroc93src
biovilchexpertauroc89.1src
chexzerochexpertauroc88.6src
gloriachexpertauroc88.2src
medclipchexpertauroc87.8src
torchxrayvisionchexpertauroc87.4src
densenet-121-cxrchexpertauroc86.5src
densenet-121-cxrrsna-pneumoniaauroc88.5src
chexnetrsna-pneumoniaauroc87.2src
torchxrayvisionnih-chestxray14auroc85.8src
chexnetnih-chestxray14auroc84.1src
densenet-121-cxrnih-chestxray14auroc82.6src
resnet-50-cxrnih-chestxray14auroc80.4src
gpt-4osvampaccuracy93.7src
claude-35-sonnetsvampaccuracy91.2src
llama-3-70bsvampaccuracy89.5src
claude-35-sonnetarc-challengeaccuracy96.7src
gpt-4oarc-challengeaccuracy96.4src
gemini-15-proarc-challengeaccuracy94.8src
llama-3-70barc-challengeaccuracy93src
gpt-4ocommonsenseqaaccuracy85.4src
claude-35-sonnetcommonsenseqaaccuracy83.2src
llama-3-70bcommonsenseqaaccuracy80.9src
gpt-4owinograndeaccuracy87.5src
claude-35-sonnetwinograndeaccuracy85.4src
llama-3-70bwinograndeaccuracy85.3src
gpt-4ohellaswagaccuracy95.3src
gemini-15-prohellaswagaccuracy92.5src
claude-35-sonnethellaswagaccuracy89src
llama-3-70bhellaswagaccuracy88src
gpt-4ohotpotqaf171.3src
claude-35-sonnethotpotqaf168.5src
gpt-4ologiqaaccuracy56.3src
claude-35-sonnetlogiqaaccuracy53.8src
gpt-4orecloraccuracy72.4src
claude-35-sonnetrecloraccuracy68.9src
gpt-4ostrategyqaaccuracy82.1src
claude-35-sonnetstrategyqaaccuracy79.8src
gpt-4omawpsaccuracy97.2src
claude-35-sonnetmawpsaccuracy95.8src
llama-3-70bmawpsaccuracy94.1src
plymouth-dl-modelabide-iaccuracy98src
mcbertabide-iaccuracy93.4src
ae-fcnabide-iaccuracy85src
asd-swnetabide-iauc81src
braingтabide-iauc78.7src
multi-atlas-dnnabide-iaccuracy78.07src
gcnabide-iauc78src
svm-connectivityabide-iauc77src
asd-swnetabide-iaccuracy76.52src
maacnnabide-iaccuracy75.12src
al-negatabide-iaccuracy74.7src
braingnnabide-iaccuracy73.3src
gcnabide-iaccuracy72.2src
multi-task-transformerabide-iaccuracy72src
phgcl-ddgformerabide-iaccuracy70.9src
svm-connectivityabide-iaccuracy70.1src
deep-learning-heinsfeldabide-iaccuracy70src
mvs-gcnabide-iaccuracy69.38src
mvs-gcnabide-iauc69.01src
abraham-connectomesabide-iaccuracy67src
random-forestabide-iaccuracy63src
deepasdabide-iiauc93src
maacnnabide-iiaccuracy72.88src
mistral-ocr-3internal-mistraloverall-accuracy94.9src
mistral-ocr-3ocr-cer-benchmarkcer3.7src
mistral-ocr-3ocr-wer-benchmarkwer7.1src
chexzeromimic-cxrauroc89.2src
torchxrayvisionmimic-cxrauroc86.3src
convirtmimic-cxrauroc85.7src
rad-dinovindr-cxrauroc91.2src
torchxrayvisionvindr-cxrauroc87.9src
torchxrayvisionpadchestauroc84.6src
densenet-121-cxrcovid-chestxrayauroc94.7src
torchxrayvisioncovid-chestxrayauroc93.2src
yolov8-weldweld-defect-xraymap87.3src
defectdet-resnetneu-detmap78.4src
mistral-ocr-2512codesota-verificationpages-per-second1.22src
ONE-PEACEade20kmIoU63src
internimage-hade20kmIoU62.9src
ViT-Adapter-L (BEiT-3)ade20kmIoU62.8src
ViT-CoMer-Lade20kmIoU62.1src
DINOv2 ViT-g/14 + Mask2Formerade20kmIoU60.2src
EVA-02-L + UperNetade20kmIoU60.1src
EoMT-L (DINOv2)ade20kmIoU59.5src
OneFormer (DiNAT-L)ade20kmIoU58.3src
mask2former-swin-lade20kmIoU57.3src
Swin-L + UperNetade20kmIoU53.5src
SegMAN-Lade20kmIoU53.2src
SegFormer-B5ade20kmIoU51.8src
SeMask-Lade20kmIoU49.35src
Codex / GPT-5.5terminal-bench-2accuracy82terminal-bench-official
ForgeCode / GPT-5.4terminal-bench-2accuracy81.8terminal-bench-official
TongAgents / Gemini 3.1 Proterminal-bench-2accuracy80.2terminal-bench-official
ForgeCode / Claude Opus 4.6terminal-bench-2accuracy79.8terminal-bench-official
SageAgent / GPT-5.3-Codexterminal-bench-2accuracy78.4terminal-bench-official
ForgeCode / Gemini 3.1 Proterminal-bench-2accuracy78.4terminal-bench-official
Droid / GPT-5.3-Codexterminal-bench-2accuracy77.3terminal-bench-official
Capy / Claude Opus 4.6terminal-bench-2accuracy75.3terminal-bench-official
Simple Codex / GPT-5.3-Codexterminal-bench-2accuracy75.1terminal-bench-official
Terminus-KIRA / Gemini 3.1 Proterminal-bench-2accuracy74.8terminal-bench-official
Terminus-KIRA / Claude Opus 4.6terminal-bench-2accuracy74.7terminal-bench-official
Mux / GPT-5.3-Codexterminal-bench-2accuracy74.6terminal-bench-official
MAYA-V2 / Claude 4.6 Opusterminal-bench-2accuracy72.1terminal-bench-official
TongAgents / Claude Opus 4.6terminal-bench-2accuracy71.9terminal-bench-official
Junie CLI / Multipleterminal-bench-2accuracy71terminal-bench-official
CodeBrain-1 / GPT-5.3-Codexterminal-bench-2accuracy70.3terminal-bench-official
Droid / Claude Opus 4.6terminal-bench-2accuracy69.9terminal-bench-official
Ante / Gemini 3 Proterminal-bench-2accuracy69.4terminal-bench-official
IndusAGI Coding Agent / GPT-5.3-Codexterminal-bench-2accuracy69.1terminal-bench-official
Crux / Claude Opus 4.6terminal-bench-2accuracy66.9terminal-bench-official
Fig 1 · All 1008 scored runs in the OCR register. Each row preserves the submission’s reported metric, numeric value and cited source verbatim.
§ 02 · By dataset

Results grouped by benchmark.

DatasetResults
demon-bench88View leaderboard →
ocrbench-v274View leaderboard →
olmocr-bench55View leaderboard →
omnidocbench47View leaderboard →
swe-bench-verified39View leaderboard →
humaneval33View leaderboard →
math29View leaderboard →
vqa-v223View leaderboard →
livecodebench22View leaderboard →
imagenet-1k22View leaderboard →
cora21View leaderboard →
abide-i21View leaderboard →
terminal-bench-220View leaderboard →
mmlu19View leaderboard →
ogb17View leaderboard →
gpqa17View leaderboard →
coco17View leaderboard →
atari-260016View leaderboard →
cifar-10015View leaderboard →
wildasr14View leaderboard →
mbpp14View leaderboard →
parsebench14View leaderboard →
voicebench13View leaderboard →
funsd13View leaderboard →
ade20k13View leaderboard →
mujoco12View leaderboard →
cc-ocr12View leaderboard →
cifar-1011View leaderboard →
mvtec-ad11View leaderboard →
textvqa9View leaderboard →
imagenet-linear-probe8View leaderboard →
iam8View leaderboard →
kitab-bench8View leaderboard →
chexpert7View leaderboard →
mteb6View leaderboard →
mme-videoocr6View leaderboard →
audiobench5View leaderboard →
nyu-depth-v25View leaderboard →
rimes5View leaderboard →
audiocaps5View leaderboard →
kitti-depth5View leaderboard →
humaneval-plus5View leaderboard →
ok-vqa5View leaderboard →
musiccaps5View leaderboard →
nocaps5View leaderboard →
gsm8k5View leaderboard →
thaiocrbench5View leaderboard →
beir4View leaderboard →
ms-marco4View leaderboard →
vbench4View leaderboard →
audioset4View leaderboard →
imagenet-zero-shot4View leaderboard →
mbpp-plus4View leaderboard →
kinetics-4004View leaderboard →
wmt234View leaderboard →
sa-1b4View leaderboard →
esc-504View leaderboard →
nih-chestxray144View leaderboard →
arc-challenge4View leaderboard →
hellaswag4View leaderboard →
robustbench-cifar10-linf3View leaderboard →
lvis-zero-shot3View leaderboard →
apps3View leaderboard →
sts-benchmark3View leaderboard →
severstal-steel-defect3View leaderboard →
carla-leaderboard3View leaderboard →
rlbench3View leaderboard →
cityscapes3View leaderboard →
medqa-usmle3View leaderboard →
robustbench-cifar10-linf-attack3View leaderboard →
coco-keypoints3View leaderboard →
moleculenet-bbbp3View leaderboard →
xnli3View leaderboard →
audiocaps-t2a3View leaderboard →
d4rl-halfcheetah-medium3View leaderboard →
libero-long3View leaderboard →
ucr-archive3View leaderboard →
imagenet-knn3View leaderboard →
wikitablequestions3View leaderboard →
split-cifar1003View leaderboard →
codecontests3View leaderboard →
glue-fill-mask3View leaderboard →
visa3View leaderboard →
severstal-steel3View leaderboard →
aime-20243View leaderboard →
svamp3View leaderboard →
commonsenseqa3View leaderboard →
winogrande3View leaderboard →
mawps3View leaderboard →
mimic-cxr3View leaderboard →
habitat-objectnav-hm3d2View leaderboard →
race2View leaderboard →
dagm-20072View leaderboard →
mini-imagenet-5way5shot2View leaderboard →
vtab-1k2View leaderboard →
gdxray-welds2View leaderboard →
imagenet-v22View leaderboard →
rsna-pneumonia2View leaderboard →
hotpotqa2View leaderboard →
logiqa2View leaderboard →
reclor2View leaderboard →
strategyqa2View leaderboard →
abide-ii2View leaderboard →
vindr-cxr2View leaderboard →
covid-chestxray2View leaderboard →
internal-mistral1View leaderboard →
ocr-cer-benchmark1View leaderboard →
ocr-wer-benchmark1View leaderboard →
padchest1View leaderboard →
weld-defect-xray1View leaderboard →
neu-det1View leaderboard →
codesota-verification1View leaderboard →
§ 03 · Pending verification

Claims awaiting reproduction.

These scores appear in published papers or vendor blog posts but have not yet been re-run against the canonical test split. They are kept visible as signal, but are not treated as evidence.

ModelDatasetClaimed valueStatus
trocr-largesroie96.58needs-pdf-verification
trocr-largeiam2.89needs-pdf-verification
paddleocr-v4icdar-2015needs-documentation-verification
polish-roberta-ocrpoleval-2021-ocr
polish-t5-ocrpoleval-2021-ocr
herbertpoleval-2021-ocr
abbyy-finereaderimpact-psnc
tesseract-polishimpact-psnc
abbyy-finereaderimpact-psnc
tesseract-polishimpact-psnc
tesseract-polishcodesota-polish
tesseract-polishcodesota-polish
tesseract-polishcodesota-polish
tesseract-polishcodesota-polish-wikipedia
tesseract-polishcodesota-polish-real
tesseract-polishcodesota-polish-synth-random
tesseract-polishcodesota-polish-synth-words
claude-sonnet-4swe-bench-verified
claude-sonnet-4-high-computeswe-bench-verified
claude-opus-4.5swe-bench-verified
o3swe-bench-verified
claude-3.7-sonnetswe-bench-verified
claude-3.5-sonnetswe-bench-verified
o1swe-bench-verified
gpt-4oswe-bench-verified
o3aime-2024
o1aime-2024
deepseek-r1aime-2024
o1aime-2024
gpt-4oaime-2024
o3gpqa-diamond
gemini-2.5-progpqa-diamond
o1gpqa-diamond
o3-minigpqa-diamond
claude-3.5-sonnetgpqa-diamond
gpt-4ogpqa-diamond
§ Final · Methodology

How these numbers stay honest.

Self-reported scores are recorded and labelled claim-only until they are reproduced. Closed API models are run against the public split through their official endpoint, with the model identifier and access date recorded. See the full methodology for what counts as a verified run.


Related OCR reading