Codesota · Models1,357 models indexed · 896 match filter
Editorial · Models
Every model, measured.
Start with a research area, drill into a vendor, or page through the full index. Only models with at least one benchmark score appear — a model without a recorded score can’t be ranked.
Vendor:Areas overviewspeakleash · 253OpenAI · 85Google · 71Qwen · 52Alibaba · 47Anthropic · 44Microsoft · 35Meta · 30Mistral · 30DeepSeek · 28google · 19meta-llama · 19mistralai · 19Meta AI · 15CYFRAGOVPL · 14Zhipu AI · 13NVIDIA · 10SpeakLeash · 10internlm · 10xAI · 10ByteDance · 9Baidu · 8PLLuM · 8ibm-granite · 8microsoft · 8Amazon · 7Google DeepMind · 7MiniMax · 7Mistral AI · 7Remek · 7Shanghai AI Lab · 7allenai · 7utter-project · 7CohereForAI · 6Microsoft Research · 6Salesforce · 601-ai · 5Alibaba Cloud · 5Cohere · 5Moonshot AI · 5NousResearch · 5THUML · 5deepseek-ai · 5DeepMind · 4Facebook AI · 4IBM · 4Meituan · 4Stanford · 4THUDM · 4UC San Diego · 4VikParuchuri · 4gguf-iq · 4nvidia · 4openchat · 4tiiuae · 4Allen AI · 3BAAI · 3Du et al. · 3ForgeCode · 3Fudan University · 3IDEA Research · 3Liao et al. · 3Moonshot.AI · 3Nam Tuan Ly / NII · 3OPI-PG · 3OpenDataLab · 3ViCoS Lab Ljubljana · 3Xiaomi · 3Zhao et al. · 3gguf · 3gguf11bv30 · 3gguf7bv30 · 3upstage · 3+ 247 smaller vendors (291 models)
§ 01 · Computer Vision models
896 models in Computer Vision · page 13 of 18.
| # | Model | Vendor | Parameters | Architecture | SOTA | Benchmarks | Results |
|---|---|---|---|---|---|---|---|
| 601 | DETR-DC5 | — | — | — | 1 | 1 | |
| 602 | DETR-DC5-R101 | — | — | — | 1 | 1 | |
| 603 | DETR-R101 | — | — | — | 1 | 1 | |
| 604 | DINO (ResNet-50) | Research (IDEA Research) | Unknown | DETR with Improved DeNoising Anchor Boxes + ResNet-50 backbone | 1 | 1 | |
| 605 | DINO (Swin-L) | Research | — | Transformer Detector | 1 | 1 | |
| 606 | DINO (Swin-L) | IDEA Research | Unknown | DETR with Improved deNoising anchOr boxes | 1 | 1 | |
| 607 | DINO-ViT-L | IDEA-Research | — | — | 1 | 1 | |
| 608 | DINOv2 (ViT-g) + Linear | Meta AI | Unknown | Self-supervised ViT-giant + linear head | 1 | 1 | |
| 609 | DINOv3 + Plain-DETR | — | — | — | 1 | 1 | |
| 610 | DINOv3 + linear probe | — | — | — | 1 | 1 | |
| 611 | DPText-DETR | AAAI 2023 | — | — | 1 | 1 | |
| 612 | DRRG | CVPR 2020 | — | — | 1 | 1 | |
| 613 | Dater | Unknown | Unknown | Unknown | 1 | 1 | |
| 614 | DeepLabV3+ | Unknown | Unknown | Unknown | 1 | 1 | |
| 615 | Deformable DETR | — | — | — | 1 | 1 | |
| 616 | Deformable DETR + iterative bounding box refinement | — | — | — | 1 | 1 | |
| 617 | Deformable DETR + iterative bounding box refinement + two-stage Deformable DETR | — | — | — | 1 | 1 | |
| 618 | DiT-B | Unknown | Unknown | Unknown | 1 | 1 | |
| 619 | DiT-B (Cascade) | Unknown | Unknown | Unknown | 1 | 1 | |
| 620 | DiT-Base | Microsoft | — | Vision Transformer (self-supervised) | 1 | 1 | |
| 621 | DiT-L (Cascade R-CNN) | Microsoft Research | Unknown | Document Image Transformer (BEiT-based) + Cascade R-CNN detection head | 1 | 1 | |
| 622 | DiT-Large | Microsoft | Unknown | Document Image Transformer Large | 1 | 1 | |
| 623 | DistillCodeT5 | FSOFT AI Lab | — | Transformer encoder-decoder | 1 | 1 | |
| 624 | DoPTA (224×224) | — | — | Transformer | 1 | 1 | |
| 625 | DoPTA-HR (512×512) | — | — | Transformer | 1 | 1 | |
| 626 | DocBert [DOCBERT] | Unknown | Unknown | Unknown | 1 | 1 | |
| 627 | DocFormer large | Unknown | Unknown | Unknown | 1 | 1 | |
| 628 | DocFormerBASE | Unknown | Unknown | Unknown | 1 | 1 | |
| 629 | DocLayout-YOLO | Unknown | Unknown | Unknown | 1 | 1 | |
| 630 | DocXClassifier-B | Unknown | Unknown | Unknown | 1 | 1 | |
| 631 | DocXClassifier-FPN | Saifullah et al. | — | CNN with Feature Pyramid Network | 1 | 1 | |
| 632 | DocXClassifier-L | Unknown | Unknown | Unknown | 1 | 1 | |
| 633 | Docling | IBM Research | Unknown | Open-source document parsing toolkit (layout + OCR + table) | 1 | 1 | |
| 634 | Dolphin | Research | — | — | 1 | 1 | |
| 635 | Dolphin-1.5 | ByteDance | — | — | 1 | 1 | |
| 636 | Dolphin-v2 | ByteDance | — | — | 1 | 1 | |
| 637 | Donut | Unknown | Unknown | Unknown | 1 | 1 | |
| 638 | Dots OCR 1.5 | RedNote HILab | Unknown | OCR-specialised open-weight VLM | 1 | 1 | |
| 639 | EK-Net++ | Research | — | — | 1 | 1 | |
| 640 | ESALE | East China Normal University | 125M | transformer | 1 | 1 | |
| 641 | EVA-02 (ViT-L/14+) | BAAI | 304M | EVA-02 ViT-L/14+, public data only | 1 | 1 | |
| 642 | EVA-02-L | BAAI | Unknown | EVA-02 Large + Cascade Mask R-CNN | 1 | 1 | |
| 643 | EVA-02-L (LVIS) | BAAI | Unknown | EVA-02 Large + ViTDet | 1 | 1 | |
| 644 | Easter2.0 | Unknown | Unknown | Unknown | 1 | 1 | |
| 645 | Eff-GNN + Word2Vec [word2vec] | Unknown | Unknown | Unknown | 1 | 1 | |
| 646 | Eff-GNN + Word2Vec [word2vec] + Image Embedding | Unknown | Unknown | Unknown | 1 | 1 | |
| 647 | EfficientDet-D7x | — | EfficientNet+BiFPN | 1 | 1 | ||
| 648 | EfficientNet-B0 | 5.3M | CNN | 1 | 1 | ||
| 649 | EfficientNetV2-L | 120M | CNN | 1 | 1 | ||
| 650 | Extend | Extend | Unknown | Document parsing + extraction API | 1 | 1 |