Codesota · Models1,357 models indexed · 896 match filter
Editorial · Models
Every model, measured.
Start with a research area, drill into a vendor, or page through the full index. Only models with at least one benchmark score appear — a model without a recorded score can’t be ranked.
Vendor:Areas overviewspeakleash · 253OpenAI · 85Google · 71Qwen · 52Alibaba · 47Anthropic · 44Microsoft · 35Meta · 30Mistral · 30DeepSeek · 28google · 19meta-llama · 19mistralai · 19Meta AI · 15CYFRAGOVPL · 14Zhipu AI · 13NVIDIA · 10SpeakLeash · 10internlm · 10xAI · 10ByteDance · 9Baidu · 8PLLuM · 8ibm-granite · 8microsoft · 8Amazon · 7Google DeepMind · 7MiniMax · 7Mistral AI · 7Remek · 7Shanghai AI Lab · 7allenai · 7utter-project · 7CohereForAI · 6Microsoft Research · 6Salesforce · 601-ai · 5Alibaba Cloud · 5Cohere · 5Moonshot AI · 5NousResearch · 5THUML · 5deepseek-ai · 5DeepMind · 4Facebook AI · 4IBM · 4Meituan · 4Stanford · 4THUDM · 4UC San Diego · 4VikParuchuri · 4gguf-iq · 4nvidia · 4openchat · 4tiiuae · 4Allen AI · 3BAAI · 3Du et al. · 3ForgeCode · 3Fudan University · 3IDEA Research · 3Liao et al. · 3Moonshot.AI · 3Nam Tuan Ly / NII · 3OPI-PG · 3OpenDataLab · 3ViCoS Lab Ljubljana · 3Xiaomi · 3Zhao et al. · 3gguf · 3gguf11bv30 · 3gguf7bv30 · 3upstage · 3+ 247 smaller vendors (291 models)
§ 01 · Computer Vision models
896 models in Computer Vision · page 4 of 18.
| # | Model | Vendor | Parameters | Architecture | SOTA | Benchmarks | Results |
|---|---|---|---|---|---|---|---|
| 151 | TFLOP | Upstage AI | Unknown | Layout Pointer mechanism; span-aware contrastive supervision; reformulates TSR as text region pointing | 1 | 1 | 2 |
| 152 | Tesseract | Google (Open Source) | — | Traditional OCR | 1 | 2 | 2 |
| 153 | ViT-H/14 | 632M | Vision Transformer | 1 | 2 | 2 | |
| 154 | BDN | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 155 | BioRex+Directionality | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 156 | Bluche | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 157 | CDeCNet | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 158 | CNN | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 159 | CNN + BLSTM | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 160 | Co-DETR (Swin-L) | Research | — | Transformer Detector | 1 | 1 | 1 |
| 161 | Co-DETR (Swin-L) | Research | Unknown | Collaborative DETR + Swin-L backbone | 1 | 1 | 1 |
| 162 | CoCa (ViT-G/14) | 2.1B | Contrastive Captioner on ViT-G/14 | 1 | 1 | 1 | |
| 163 | CoCa (finetuned) | 2.1B | Contrastive Captioner | 1 | 1 | 1 | |
| 164 | ConvTextTM | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 165 | DAL | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 166 | DINOv3 + Mask2Former (simple) | — | — | — | 1 | 1 | 1 |
| 167 | DINOv3 + Plain-DETR + TTA | — | — | — | 1 | 1 | 1 |
| 168 | DOCmT5 | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 169 | DiT-L (Cascade) | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 170 | DocFormerv2-Large | Adobe Research | Unknown | Multimodal encoder with spatial-aware cross-attention | 1 | 1 | 1 |
| 171 | Document Classification Using Importance of Sentences | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 172 | EAML | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 173 | GCN Hybrid | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 174 | I2L-NOPOOL | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 175 | JDeskew | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 176 | KHCR | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 177 | LayoutLMv3 | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 178 | LlamaParse Agentic | LlamaIndex | Unknown | Agentic multi-step LlamaParse pipeline | 1 | 1 | 1 |
| 179 | MetaSelf-Learning | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 180 | Oracle-BERT | Unknown | — | oracle-extractive | 1 | 1 | 1 |
| 181 | Oracle-BERT (HowSumm-Method) | Unknown | — | — | 1 | 1 | 1 |
| 182 | PGNet-A | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 183 | PesRec | Xingwen Cao et al. (LIESMARS, Wuhan University) | — | Multi-task CNN: spatial layout estimator + 3D object detector + mesh generator | 1 | 1 | 1 |
| 184 | Proposed System (With post- processing) | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 185 | Q-SENN | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 186 | Query-doc RobeCzech (Roberta-base) | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 187 | REXEL | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 188 | ResNet-RS (ResNet-200 + RS training tricks) | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 189 | SENet | Momenta | — | — | 1 | 1 | 1 |
| 190 | STREET | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 191 | ScyllaNet | Scylla Technologies | — | — | 1 | 1 | 1 |
| 192 | Seed1.6-vision | ByteDance | — | Vision-Language Model | 1 | 1 | 1 |
| 193 | Siamese_MHCA_SA | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 194 | Siamese_MultiHeadCrossAttention_SoftAttention (Siamese_MHCA_SA) | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 195 | StarCoder-LoRA | BigCode / Salesforce | 15.5B | Transformer decoder | 1 | 1 | 1 |
| 196 | Swin Transformer V2 Large | Microsoft | 197M | Hierarchical Vision Transformer | 1 | 1 | 1 |
| 197 | TCM | CLIP-based | — | — | 1 | 1 | 1 |
| 198 | TabTracer | Unknown | — | — | 1 | 1 | 1 |
| 199 | Transformer w/ CNN | Unknown | Unknown | Unknown | 1 | 1 | 1 |
| 200 | VGG | Unknown | Unknown | Unknown | 1 | 1 | 1 |