Codesota · Models1,357 models indexed · 896 match filter
Editorial · Models
Every model, measured.
Start with a research area, drill into a vendor, or page through the full index. Only models with at least one benchmark score appear — a model without a recorded score can’t be ranked.
Vendor:Areas overviewspeakleash · 253OpenAI · 85Google · 71Qwen · 52Alibaba · 47Anthropic · 44Microsoft · 35Meta · 30Mistral · 30DeepSeek · 28google · 19meta-llama · 19mistralai · 19Meta AI · 15CYFRAGOVPL · 14Zhipu AI · 13NVIDIA · 10SpeakLeash · 10internlm · 10xAI · 10ByteDance · 9Baidu · 8PLLuM · 8ibm-granite · 8microsoft · 8Amazon · 7Google DeepMind · 7MiniMax · 7Mistral AI · 7Remek · 7Shanghai AI Lab · 7allenai · 7utter-project · 7CohereForAI · 6Microsoft Research · 6Salesforce · 601-ai · 5Alibaba Cloud · 5Cohere · 5Moonshot AI · 5NousResearch · 5THUML · 5deepseek-ai · 5DeepMind · 4Facebook AI · 4IBM · 4Meituan · 4Stanford · 4THUDM · 4UC San Diego · 4VikParuchuri · 4gguf-iq · 4nvidia · 4openchat · 4tiiuae · 4Allen AI · 3BAAI · 3Du et al. · 3ForgeCode · 3Fudan University · 3IDEA Research · 3Liao et al. · 3Moonshot.AI · 3Nam Tuan Ly / NII · 3OPI-PG · 3OpenDataLab · 3ViCoS Lab Ljubljana · 3Xiaomi · 3Zhao et al. · 3gguf · 3gguf11bv30 · 3gguf7bv30 · 3upstage · 3+ 247 smaller vendors (291 models)
§ 01 · Computer Vision models
896 models in Computer Vision · page 15 of 18.
| # | Model | Vendor | Parameters | Architecture | SOTA | Benchmarks | Results |
|---|---|---|---|---|---|---|---|
| 701 | LightOnOCR-1B-1025 | — | — | — | 1 | 1 | |
| 702 | LlamaParse Cost Effective | LlamaIndex | Unknown | Cost-optimised LlamaParse pipeline (<$0.004/page) | 1 | 1 | |
| 703 | MAEDet | IJCAI 2025 | — | — | 1 | 1 | |
| 704 | MAERec | Jiang et al. | Unknown | ViT backbone + Transformer decoder, MAE self-supervised pre-training on Union14M-U | 1 | 1 | |
| 705 | MAERec-S | Research | Unknown | Masked AutoEncoder for scene text Recognition (ViT-Small) | 1 | 1 | |
| 706 | MLDG | Unknown | Unknown | Unknown | 1 | 1 | |
| 707 | MORAN | Unknown | Unknown | Unknown | 1 | 1 | |
| 708 | MambaVision-L2 | NVIDIA | 241M | Hybrid Mamba-Transformer | 1 | 1 | |
| 709 | Marker 1.10.1 | VikParuchuri | — | PDF Parser | 1 | 1 | |
| 710 | Marker 1.8.2 | VikParuchuri | — | — | 1 | 1 | |
| 711 | Mask R-CNN (ResNeXt-101-FPN) | — | — | — | 1 | 1 | |
| 712 | Mask2Former (Swin-L) | Meta AI | Unknown | Masked-attention Mask Transformer + Swin-L | 1 | 1 | |
| 713 | Mask2Former (Swin-L) LVIS | Meta AI | Unknown | Masked-attention Mask Transformer + Swin-L | 1 | 1 | |
| 714 | Mask2Former + ResNet-50 | — | — | — | 1 | 1 | |
| 715 | Mask2Former + Swin-L-FaPN | — | — | — | 1 | 1 | |
| 716 | Mask2Former + Swin-T | — | — | — | 1 | 1 | |
| 717 | MaskFormer (Swin-T) | — | — | — | 1 | 1 | |
| 718 | MaskOCR-L | Unknown | Unknown | Unknown | 1 | 1 | |
| 719 | MinerU2-VLM | OpenDataLab | — | — | 1 | 1 | |
| 720 | MinerU2-pipeline | OpenDataLab | — | — | 1 | 1 | |
| 721 | Mistral OCR 2 | Mistral | — | Vision-Language Model | 1 | 1 | |
| 722 | MonkeyOCR-pro-1.2B | — | — | — | 1 | 1 | |
| 723 | MonkeyOCR-pro-1.2B | MonkeyOCR | — | — | 1 | 1 | |
| 724 | Mr. DETR | — | — | — | 1 | 1 | |
| 725 | Multimodal (MobileNetV2) | Unknown | Unknown | Unknown | 1 | 1 | |
| 726 | Multimodal (ResNet50) | Unknown | Unknown | Unknown | 1 | 1 | |
| 727 | Multimodal Side-Tuning (MobileNetV2) | Unknown | Unknown | Unknown | 1 | 1 | |
| 728 | Multimodal Side-Tuning (ResNet50) | Unknown | Unknown | Unknown | 1 | 1 | |
| 729 | NCBI_BERT(large) (P) | Unknown | Unknown | Unknown | 1 | 1 | |
| 730 | NCGM | Unknown | Unknown | Unknown | 1 | 1 | |
| 731 | NEC-UIUC | NEC / UIUC | — | — | 1 | 1 | |
| 732 | NJU-ImagineLab | Nanjing University | Unknown | Scene text detector | 1 | 1 | |
| 733 | Nanonets OCR2 3B | Nanonets | — | Vision-Language OCR Model | 1 | 1 | |
| 734 | Nanonets-OCR-s | Nanonets | — | — | 1 | 1 | |
| 735 | Nemotron Nano V2 VL | NVIDIA | — | Vision-Language Model | 1 | 1 | |
| 736 | NormTab (Targeted) + SQL | Unknown | Unknown | Unknown | 1 | 1 | |
| 737 | OCRFlux-3B | ChatDoc | — | — | 1 | 1 | |
| 738 | OCRVerse 4B | Unknown | 4B | Vision-Language OCR Model | 1 | 1 | |
| 739 | OTSNet | Anonymous / arxiv preprint | Unknown | Observation-Thinking-Spelling unified network | 1 | 1 | |
| 740 | OneFormer (Swin-L) | — | — | — | 1 | 1 | |
| 741 | Oracle-BOW | Unknown | — | oracle-extractive | 1 | 1 | |
| 742 | Oracle-BOW (HowSumm-Method) | Unknown | — | — | 1 | 1 | |
| 743 | Oracle-HierSumm | Unknown | — | oracle-extractive | 1 | 1 | |
| 744 | PAC | Yan et al. | — | — | 1 | 1 | |
| 745 | PANet (Joint) | ICCV 2019 | — | — | 1 | 1 | |
| 746 | PGNet-E | Unknown | Unknown | Unknown | 1 | 1 | |
| 747 | PLBART | UCLA / Columbia University | 140M | Transformer encoder-decoder | 1 | 1 | |
| 748 | POINTS-Reader | Research | — | — | 1 | 1 | |
| 749 | PP-StructureV3 | Baidu | — | — | 1 | 1 | |
| 750 | PSENet | CVPR 2019 | — | — | 1 | 1 |