Codesota · Models1,357 models indexed · 896 match filter
Editorial · Models
Every model, measured.
Start with a research area, drill into a vendor, or page through the full index. Only models with at least one benchmark score appear — a model without a recorded score can’t be ranked.
Vendor:Areas overviewspeakleash · 253OpenAI · 85Google · 71Qwen · 52Alibaba · 47Anthropic · 44Microsoft · 35Meta · 30Mistral · 30DeepSeek · 28google · 19meta-llama · 19mistralai · 19Meta AI · 15CYFRAGOVPL · 14Zhipu AI · 13NVIDIA · 10SpeakLeash · 10internlm · 10xAI · 10ByteDance · 9Baidu · 8PLLuM · 8ibm-granite · 8microsoft · 8Amazon · 7Google DeepMind · 7MiniMax · 7Mistral AI · 7Remek · 7Shanghai AI Lab · 7allenai · 7utter-project · 7CohereForAI · 6Microsoft Research · 6Salesforce · 601-ai · 5Alibaba Cloud · 5Cohere · 5Moonshot AI · 5NousResearch · 5THUML · 5deepseek-ai · 5DeepMind · 4Facebook AI · 4IBM · 4Meituan · 4Stanford · 4THUDM · 4UC San Diego · 4VikParuchuri · 4gguf-iq · 4nvidia · 4openchat · 4tiiuae · 4Allen AI · 3BAAI · 3Du et al. · 3ForgeCode · 3Fudan University · 3IDEA Research · 3Liao et al. · 3Moonshot.AI · 3Nam Tuan Ly / NII · 3OPI-PG · 3OpenDataLab · 3ViCoS Lab Ljubljana · 3Xiaomi · 3Zhao et al. · 3gguf · 3gguf11bv30 · 3gguf7bv30 · 3upstage · 3+ 247 smaller vendors (291 models)
§ 01 · Computer Vision models
896 models in Computer Vision · page 1 of 18.
| # | Model | Vendor | Parameters | Architecture | SOTA | Benchmarks | Results |
|---|---|---|---|---|---|---|---|
| 001 | GPT-4o | OpenAI | Undisclosed | Multimodal LLM | 15 | 45 | 57 |
| 002 | HTLM (fine-tuning) | Unknown | Unknown | Unknown | 11 | 5 | 20 |
| 003 | fglihai | Unknown | Unknown | Unknown | 11 | 2 | 12 |
| 004 | GPT-2-Large (fine-tuning) | Unknown | Unknown | Unknown | 7 | 5 | 20 |
| 005 | ELSC | Unknown | Unknown | Unknown | 7 | 7 | 7 |
| 006 | Hybrid DLA (Shehzadi et al.) | DFKI / TU Kaiserslautern | Unknown | Transformer object detector with query encoding + hybrid one-to-one/one-to-many matching | 6 | 1 | 6 |
| 007 | StackMix+Blots | Unknown | Unknown | Unknown | 6 | 6 | 6 |
| 008 | TextFuseNet (ResNeXt-101) | Unknown | Unknown | Unknown | 5 | 6 | 16 |
| 009 | USYD NLP_CS29-2 | Unknown | Unknown | Unknown | 5 | 1 | 6 |
| 010 | XLMft UDA | Unknown | Unknown | Unknown | 5 | 5 | 5 |
| 011 | CRAFT | Unknown | Unknown | Unknown | 4 | 7 | 21 |
| 012 | CLIP4STR-L (DataComp-1B) | Unknown | Unknown | Unknown | 4 | 9 | 9 |
| 013 | DINOv3 (7B) | — | — | — | 4 | 8 | 8 |
| 014 | DTrOCR 105M | Unknown | Unknown | Unknown | 4 | 8 | 8 |
| 015 | ApproxRepSet | Unknown | Unknown | Unknown | 4 | 6 | 6 |
| 016 | CCD-ViT-Small | Unknown | Unknown | Unknown | 4 | 4 | 5 |
| 017 | T5B Baseline | Unknown | Unknown | Unknown | 4 | 1 | 5 |
| 018 | GFCN | Unknown | Unknown | Unknown | 4 | 2 | 4 |
| 019 | Claude Sonnet 4 | Anthropic | — | Multimodal LLM | 3 | 15 | 21 |
| 020 | Gemini 1.5 Pro | — | Multimodal LLM | 3 | 17 | 21 | |
| 021 | Gemini 2.5 Pro | — | Multimodal LLM | 3 | 15 | 16 | |
| 022 | Qianfan-OCR | Baidu Qianfan | 4B | End-to-end VLM (4B params) | 3 | 4 | 16 |
| 023 | LightOnOCR-2-1B | LightOn | 1B | Vision-Language Model (1B params) | 3 | 1 | 9 |
| 024 | EDD | Unknown | Unknown | Unknown | 3 | 3 | 7 |
| 025 | VGT | Unknown | Unknown | Unknown | 3 | 2 | 7 |
| 026 | BRIO | Yale NLP | Unknown | BART-large with contrastive learning objective | 3 | 2 | 6 |
| 027 | BigBird-Pegasus | Unknown | Unknown | Unknown | 3 | 2 | 6 |
| 028 | Habitat-Web | Unknown | Unknown | Unknown | 3 | 2 | 6 |
| 029 | UNITS | Unknown | Unknown | Unknown | 3 | 2 | 5 |
| 030 | Optimized Text CNN | Unknown | Unknown | Unknown | 3 | 2 | 4 |
| 031 | AKHCRNet | Unknown | Unknown | Unknown | 3 | 1 | 3 |
| 032 | BPDO | Zheng et al. | Unknown | ResNet-50 + FPN + DCN + Text-Aware Module + Dynamic Optimization Module | 3 | 1 | 3 |
| 033 | CPN (Complementary Proposal Network) | Longhuang Wu et al. | Unknown | Deformable Morphology Semantic Network + Balanced Region Proposal Network + Interleaved Feature Attention | 3 | 1 | 3 |
| 034 | CodeTrans-MT-Base | Unknown | Unknown | Unknown | 3 | 3 | 3 |
| 035 | ContourNet [69] | Unknown | Unknown | Unknown | 3 | 1 | 3 |
| 036 | DAT-SEG | Wan et al. (Baidu) | Unknown | Interactive attention transformer with segmentation head for multi-granularity text detection | 3 | 1 | 3 |
| 037 | HierarchicalEncoder + NR + IR | Unknown | Unknown | Unknown | 3 | 1 | 3 |
| 038 | PCGAN-CHAR | Unknown | Unknown | Unknown | 3 | 3 | 3 |
| 039 | Segment Anything Model (SAM) | Unknown | — | — | 3 | 3 | 3 |
| 040 | Claude 3.5 Sonnet | Anthropic | Undisclosed | Multimodal LLM | 2 | 27 | 32 |
| 041 | Qwen3.5-397B-A17B | Alibaba | — | — | 2 | 14 | 20 |
| 042 | Faster R-CNN | Microsoft Research | Unknown | Unknown | 2 | 4 | 19 |
| 043 | Qwen2-VL 72B | Alibaba | — | Vision-Language Model | 2 | 12 | 18 |
| 044 | CLIP4STR-L | Unknown | Unknown | Unknown | 2 | 10 | 10 |
| 045 | DAN | Unknown | Unknown | Unknown | 2 | 7 | 10 |
| 046 | Chandra v0.1.0 | datalab-to | 9B | Vision-Language OCR Model | 2 | 1 | 9 |
| 047 | Ovis2.5-9B | — | — | — | 2 | 8 | 9 |
| 048 | DETR | Meta AI / FAIR | Unknown | Unknown | 2 | 2 | 8 |
| 049 | FAST-T-512 | Unknown | Unknown | Unknown | 2 | 2 | 8 |
| 050 | DeepSolo (ViTAEv2-S, TextOCR) | Unknown | Unknown | Unknown | 2 | 3 | 7 |