Codesota · Models1,357 models indexed · 896 match filter
Editorial · Models
Every model, measured.
Start with a research area, drill into a vendor, or page through the full index. Only models with at least one benchmark score appear — a model without a recorded score can’t be ranked.
Vendor:Areas overviewspeakleash · 253OpenAI · 85Google · 71Qwen · 52Alibaba · 47Anthropic · 44Microsoft · 35Meta · 30Mistral · 30DeepSeek · 28google · 19meta-llama · 19mistralai · 19Meta AI · 15CYFRAGOVPL · 14Zhipu AI · 13NVIDIA · 10SpeakLeash · 10internlm · 10xAI · 10ByteDance · 9Baidu · 8PLLuM · 8ibm-granite · 8microsoft · 8Amazon · 7Google DeepMind · 7MiniMax · 7Mistral AI · 7Remek · 7Shanghai AI Lab · 7allenai · 7utter-project · 7CohereForAI · 6Microsoft Research · 6Salesforce · 601-ai · 5Alibaba Cloud · 5Cohere · 5Moonshot AI · 5NousResearch · 5THUML · 5deepseek-ai · 5DeepMind · 4Facebook AI · 4IBM · 4Meituan · 4Stanford · 4THUDM · 4UC San Diego · 4VikParuchuri · 4gguf-iq · 4nvidia · 4openchat · 4tiiuae · 4Allen AI · 3BAAI · 3Du et al. · 3ForgeCode · 3Fudan University · 3IDEA Research · 3Liao et al. · 3Moonshot.AI · 3Nam Tuan Ly / NII · 3OPI-PG · 3OpenDataLab · 3ViCoS Lab Ljubljana · 3Xiaomi · 3Zhao et al. · 3gguf · 3gguf11bv30 · 3gguf7bv30 · 3upstage · 3+ 247 smaller vendors (291 models)
§ 01 · Computer Vision models
896 models in Computer Vision · page 11 of 18.
| # | Model | Vendor | Parameters | Architecture | SOTA | Benchmarks | Results |
|---|---|---|---|---|---|---|---|
| 501 | OrigamiNet-24 | Unknown | Unknown | Unknown | 1 | 2 | |
| 502 | PyLaia (all transcriptions + agreement-based split) | Unknown | Unknown | Unknown | 1 | 2 | |
| 503 | PyLaia (human transcriptions + agreement-based split) | Unknown | Unknown | Unknown | 1 | 2 | |
| 504 | PyLaia (rover consensus + agreement-based split) | Unknown | Unknown | Unknown | 1 | 2 | |
| 505 | Qwen2.5-VL 32B | Alibaba | — | Vision-Language Model | 2 | 2 | |
| 506 | Qwen3-VL-4B | Alibaba Qwen | 4B | Vision-Language Model (4B params) | 2 | 2 | |
| 507 | ReasTAP-Large | Unknown | Unknown | Unknown | 1 | 2 | |
| 508 | SANA | — | — | — | 1 | 2 | |
| 509 | SIGA_S | Unknown | Unknown | Unknown | 2 | 2 | |
| 510 | SLANet | Unknown | Unknown | Unknown | 1 | 2 | |
| 511 | SSD300 (VGG-16) | Google / UNC | ~24M | Single-shot multibox detector with VGG-16 backbone, 300x300 input | 1 | 2 | |
| 512 | Salience-aware TAPAS | Unknown | Unknown | Unknown | 1 | 2 | |
| 513 | SwinTextSpotter v2 | Academic | — | Swin Transformer, improved detection-recognition synergy | 1 | 2 | |
| 514 | T5-3b(UnifiedSKG) | Unknown | Unknown | Unknown | 1 | 2 | |
| 515 | TABLET | Anonymous (arXiv 2025) | Unknown | Dual Transformer encoders; encoder-only architecture; row/column splitting as sequence labeling | 1 | 2 | |
| 516 | TAPAS-Large classifier with Counterfactual + Synthetic pre-training | Unknown | Unknown | Unknown | 1 | 2 | |
| 517 | TAPEX-Large | Unknown | Unknown | Unknown | 1 | 2 | |
| 518 | TPSNet | Unknown | Unknown | Unknown | 1 | 2 | |
| 519 | TRUST | Unknown | Unknown | Unknown | 1 | 2 | |
| 520 | TabStruct-Net | Unknown | Unknown | Unknown | 1 | 2 | |
| 521 | Table NLM | Unknown | Unknown | Unknown | 1 | 2 | |
| 522 | Table-BERT-Horizontal-T+F-Template | Unknown | Unknown | Unknown | 1 | 2 | |
| 523 | UniTable Large | Georgia Tech (Peng et al.) | Unknown | ViT encoder + autoregressive decoder; self-supervised pretraining on unannotated tabular images | 1 | 2 | |
| 524 | VAI-OCR | Unknown | Unknown | Unknown | 1 | 2 | |
| 525 | ViT-B/16 | 86M | Vision Transformer | 2 | 2 | ||
| 526 | ViTDet-H (MAE) | Meta AI | Unknown | Plain ViT-H backbone with simple feature pyramid, Cascade Mask RCNN head | 1 | 2 | |
| 527 | VideoPrism-g | — | — | — | 2 | 2 | |
| 528 | biCVM+ | Unknown | Unknown | Unknown | 2 | 2 | |
| 529 | claude-3.5-sonnet | Unknown | Unknown | Unknown | 1 | 2 | |
| 530 | dots.mocr | — | — | — | 2 | 2 | |
| 531 | gpt-4o-2024 | Unknown | Unknown | Unknown | 1 | 2 | |
| 532 | minicpm-v-4.5-8b | Unknown | Unknown | Unknown | 1 | 2 | |
| 533 | mistral-ocr-2512 | Unknown | Unknown | Unknown | 2 | 2 | |
| 534 | olmOCR v0.3.0 | Allen AI | — | OCR Pipeline | 1 | 2 | |
| 535 | sail-vl2-8b | Unknown | Unknown | Unknown | 1 | 2 | |
| 536 | Self-Attention + CTC + language model | Unknown | Unknown | Unknown | 1 | 1 | |
| 537 | 3DGP | Unknown | Unknown | Unknown | 1 | 1 | |
| 538 | ABCNet v2 | TPAMI 2021 | — | — | 1 | 1 | |
| 539 | AIMv2-3B | Apple | 2.7B | Vision Transformer (Autoregressive Pre-trained) | 1 | 1 | |
| 540 | AIN 7B | Research | — | Vision-Language Model | 1 | 1 | |
| 541 | ARTEMIS-DA | Unknown | Unknown | Unknown | 1 | 1 | |
| 542 | AWS Textract | Amazon Web Services | Unknown | Managed OCR + layout + table extraction service | 1 | 1 | |
| 543 | Abdallah | Unknown | Unknown | Unknown | 1 | 1 | |
| 544 | AlexNet | U. Toronto | — | — | 1 | 1 | |
| 545 | AlexNet + spatial pyramidal pooling + image resizing | Unknown | Unknown | Unknown | 1 | 1 | |
| 546 | Anthropic Haiku 4.5 | Anthropic | Unknown | Vision-language model (thinking enabled) | 1 | 1 | |
| 547 | ArabicNougat | community | — | — | 1 | 1 | |
| 548 | ArtDet-v2 | Sogou OCR team | Unknown | Scene text detector | 1 | 1 | |
| 549 | AttentionOCR_Inception-resnet-v2_Location | Unknown | Unknown | Unknown | 1 | 1 | |
| 550 | Azure Document Intelligence | Microsoft | Unknown | Managed layout + OCR extraction service | 1 | 1 |