Codesota · Models1,357 models indexed · 896 match filter
Editorial · Models

Every model, measured.

Start with a research area, drill into a vendor, or page through the full index. Only models with at least one benchmark score appear — a model without a recorded score can’t be ranked.

Vendor:Areas overviewspeakleash · 253OpenAI · 85Google · 71Qwen · 52Alibaba · 47Anthropic · 44Microsoft · 35Meta · 30Mistral · 30DeepSeek · 28google · 19meta-llama · 19mistralai · 19Meta AI · 15CYFRAGOVPL · 14Zhipu AI · 13NVIDIA · 10SpeakLeash · 10internlm · 10xAI · 10ByteDance · 9Baidu · 8PLLuM · 8ibm-granite · 8microsoft · 8Amazon · 7Google DeepMind · 7MiniMax · 7Mistral AI · 7Remek · 7Shanghai AI Lab · 7allenai · 7utter-project · 7CohereForAI · 6Microsoft Research · 6Salesforce · 601-ai · 5Alibaba Cloud · 5Cohere · 5Moonshot AI · 5NousResearch · 5THUML · 5deepseek-ai · 5DeepMind · 4Facebook AI · 4IBM · 4Meituan · 4Stanford · 4THUDM · 4UC San Diego · 4VikParuchuri · 4gguf-iq · 4nvidia · 4openchat · 4tiiuae · 4Allen AI · 3BAAI · 3Du et al. · 3ForgeCode · 3Fudan University · 3IDEA Research · 3Liao et al. · 3Moonshot.AI · 3Nam Tuan Ly / NII · 3OPI-PG · 3OpenDataLab · 3ViCoS Lab Ljubljana · 3Xiaomi · 3Zhao et al. · 3gguf · 3gguf11bv30 · 3gguf7bv30 · 3upstage · 3+ 247 smaller vendors (291 models)
§ 01 · Computer Vision models

896 models in Computer Vision · page 4 of 18.

#ModelVendorParametersArchitectureSOTABenchmarksResults
151TFLOPUpstage AIUnknownLayout Pointer mechanism; span-aware contrastive supervision; reformulates TSR as text region pointing112
152TesseractGoogle (Open Source)Traditional OCR122
153ViT-H/14Google632MVision Transformer122
154BDNUnknownUnknownUnknown111
155BioRex+DirectionalityUnknownUnknownUnknown111
156BlucheUnknownUnknownUnknown111
157CDeCNetUnknownUnknownUnknown111
158CNNUnknownUnknownUnknown111
159CNN + BLSTMUnknownUnknownUnknown111
160Co-DETR (Swin-L)ResearchTransformer Detector111
161Co-DETR (Swin-L)ResearchUnknownCollaborative DETR + Swin-L backbone111
162CoCa (ViT-G/14)Google2.1BContrastive Captioner on ViT-G/14111
163CoCa (finetuned)Google2.1BContrastive Captioner111
164ConvTextTMUnknownUnknownUnknown111
165DALUnknownUnknownUnknown111
166DINOv3 + Mask2Former (simple) 111
167DINOv3 + Plain-DETR + TTA111
168DOCmT5UnknownUnknownUnknown111
169DiT-L (Cascade)UnknownUnknownUnknown111
170DocFormerv2-LargeAdobe ResearchUnknownMultimodal encoder with spatial-aware cross-attention111
171Document Classification Using Importance of SentencesUnknownUnknownUnknown111
172EAMLUnknownUnknownUnknown111
173GCN HybridUnknownUnknownUnknown111
174I2L-NOPOOLUnknownUnknownUnknown111
175JDeskewUnknownUnknownUnknown111
176KHCRUnknownUnknownUnknown111
177LayoutLMv3UnknownUnknownUnknown111
178LlamaParse AgenticLlamaIndexUnknownAgentic multi-step LlamaParse pipeline111
179MetaSelf-LearningUnknownUnknownUnknown111
180Oracle-BERTUnknownoracle-extractive111
181Oracle-BERT (HowSumm-Method)Unknown111
182PGNet-AUnknownUnknownUnknown111
183PesRecXingwen Cao et al. (LIESMARS, Wuhan University)Multi-task CNN: spatial layout estimator + 3D object detector + mesh generator111
184Proposed System (With post- processing)UnknownUnknownUnknown111
185Q-SENNUnknownUnknownUnknown111
186Query-doc RobeCzech (Roberta-base)UnknownUnknownUnknown111
187REXELUnknownUnknownUnknown111
188ResNet-RS (ResNet-200 + RS training tricks)UnknownUnknownUnknown111
189SENetMomenta111
190STREETUnknownUnknownUnknown111
191ScyllaNetScylla Technologies111
192Seed1.6-visionByteDanceVision-Language Model111
193Siamese_MHCA_SAUnknownUnknownUnknown111
194Siamese_MultiHeadCrossAttention_SoftAttention (Siamese_MHCA_SA)UnknownUnknownUnknown111
195StarCoder-LoRABigCode / Salesforce15.5BTransformer decoder111
196Swin Transformer V2 LargeMicrosoft197MHierarchical Vision Transformer111
197TCMCLIP-based111
198TabTracerUnknown111
199Transformer w/ CNNUnknownUnknownUnknown111
200VGGUnknownUnknownUnknown111