| 01 | coca-finetuned Current SOTA on ImageNet-1K. 2.1B parameters. Contrastive Captioner architecture. | paper | 91 | 2025 | Source ↗ | Edit result |
| 02 | CoCa (finetuned) Current SOTA on ImageNet-1K. 2.1B parameters. Contrastive Captioner architecture. | unverified | 91 | 2025 | Source ↗ | Edit result |
| 03 | vit-g-14 Giant ViT variant. 1.8B parameters. | paper | 90.45 | 2025 | Source ↗ | Edit result |
| 04 | ViT-G/14 Giant ViT variant. 1.8B parameters. | unverified | 90.45 | 2025 | Source ↗ | Edit result |
| 05 | SoViT-400m/14 SoViT-400m/14, shape-optimized ViT with 400M params. Finetuned on ImageNet-1K at 224px. 90.3% top-1 on IN-1K val. Surpasses ViT-g/14 (90.0%) at less than half the inference cost. NeurIPS 2023, paper revised Jan 2024. Source: arxiv:2305.13035 abstract. | unverified | 90.3 | 2026 | Source ↗ | Edit result |
| 06 | EVA-02-L EVA-02 ViT-L/14+ 304M params. MIM pre-training on Merged-38M, finetuned on IN-22K then IN-1K at 448x448. Source: timm results CSV (eva02_large_patch14_448.mim_m38m_ft_in22k_in1k). Paper: arxiv:2303.11331. | verified | 90.056 | 2026 | Source ↗ | Edit result |
| 07 | EVA-Giant EVA ViT-Giant/14, 1B params. MIM pre-training on Merged-30M, finetuned on IN-22K then IN-1K at 560x560. Source: timm results CSV (eva_giant_patch14_560.m30m_ft_in22k_in1k). Paper: arxiv:2211.07636. | verified | 89.79 | 2026 | Source ↗ | Edit result |
| 08 | InternImage-H InternImage-H 1.08B params with deformable convolutions. IN-22K pretraining + joint ImageNet training, 640x640. Source: OpenGVLab/InternImage classification README. Paper: arxiv:2211.05778. | verified | 89.6 | 2026 | Source ↗ | Edit result |
| 09 | AIMv2-3B AIMv2-3B, multimodal autoregressive pre-training, 2.7B params, 448px. 89.5% top-1 on IN-1K val using attentive probing (frozen backbone + 2-layer attentive head). Apple, Nov 2024. Source: github.com/apple/ml-aim README table. Paper: arxiv:2411.14402. | paper | 89.5 | 2026 | Source ↗ | Edit result |
| 10 | SigLIP-SO400M Shape-Optimized SigLIP 400M, patch14, res 378. Contrastive pre-training on WebLI, finetuned on IN-1K. Source: timm results CSV (vit_so400m_patch14_siglip_378.webli_ft_in1k). Paper: arxiv:2303.15343. | verified | 89.41 | 2026 | Source ↗ | Edit result |
| 11 | convnext-v2-huge Best pure ConvNet. 650M parameters. Trained with FCMAE. | paper | 88.9 | 2025 | Source ↗ | Edit result |
| 12 | ConvNeXt V2 Huge Best pure ConvNet. 650M parameters. Trained with FCMAE. | unverified | 88.9 | 2025 | Source ↗ | Edit result |
| 13 | ViT-H/14 CLIP (LAION-2B) ViT-H/14 CLIP pre-trained on LAION-2B, finetuned on IN-12K then IN-1K at 336px. Source: timm results CSV (vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k). Paper: arxiv:2212.07143. | verified | 88.634 | 2026 | Source ↗ | Edit result |
| 14 | ConvNeXt-XXLarge (CLIP LAION) ConvNeXt-XXLarge, CLIP pre-trained on LAION-2B, soup finetuned on IN-1K. Source: timm results CSV (convnext_xxlarge.clip_laion2b_soup_ft_in1k). Paper: arxiv:2212.07143 (OpenCLIP). | verified | 88.622 | 2026 | Source ↗ | Edit result |
| 15 | ViT-H/14 Huge ViT variant. 632M parameters. | unverified | 88.55 | 2025 | Source ↗ | Edit result |
| 16 | vit-h-14 Huge ViT variant. 632M parameters. | paper | 88.55 | 2025 | Source ↗ | Edit result |
| 17 | InternViT-6B (InternVL) InternViT-6B, 6B-param vision encoder, patch14, 224px. 88.23% Acc@1 on IN-1K val (50k images) via linear probing (frozen backbone + linear head). OpenGVLab, CVPR 2024 Oral. Source: Hugging Face model card OpenGVLab/InternViT-6B-448px-V2_5. Paper: arxiv:2312.14238. | unverified | 88.23 | 2026 | Source ↗ | Edit result |
| 18 | swin-large Hierarchical Vision Transformer with shifted windows. | paper | 87.3 | 2025 | Source ↗ | Edit result |
| 19 | Swin Transformer Large Hierarchical Vision Transformer with shifted windows. | unverified | 87.3 | 2025 | Source ↗ | Edit result |
| 20 | efficientnet-v2-l Pretrained on ImageNet-21K, fine-tuned on 1K. | paper | 85.7 | 2025 | Source ↗ | Edit result |
| 21 | EfficientNetV2-L Pretrained on ImageNet-21K, fine-tuned on 1K. | unverified | 85.7 | 2025 | Source ↗ | Edit result |
| 22 | MambaVision-L2 MambaVision-L2, hybrid Mamba-Transformer backbone, 241M params. Finetuned on ImageNet-1K at 224px. 85.3% top-1. Sets new SOTA Pareto front for accuracy vs. throughput. NVIDIA, CVPR 2025. Source: arxiv:2407.08083 Table 1. | unverified | 85.3 | 2026 | Source ↗ | Edit result |
| 23 | deit-b-distilled Data-efficient ViT with distillation. Trained on ImageNet-1K only. | paper | 85.2 | 2025 | Source ↗ | Edit result |
| 24 | DeiT-B Distilled Data-efficient ViT with distillation. Trained on ImageNet-1K only. | unverified | 85.2 | 2025 | Source ↗ | Edit result |
| 25 | EfficientNet-B7 8.4x smaller than GPipe. 66M parameters. | unverified | 84.4 | 2025 | Source ↗ | Edit result |
| 26 | DeiT-B Without distillation. Trained from scratch on ImageNet-1K. | unverified | 83.1 | 2025 | Source ↗ | Edit result |
| 27 | ConvNeXt V2 Tiny 28M parameters. Efficient variant. | unverified | 83 | 2025 | Source ↗ | Edit result |
| 28 | convnext-v2-tiny 28M parameters. Efficient variant. | paper | 83 | 2025 | Source ↗ | Edit result |
| 29 | vit-l-16 Large ViT with ImageNet-21K pretraining. | paper | 82.7 | 2025 | Source ↗ | Edit result |
| 30 | ViT-L/16 Large ViT with ImageNet-21K pretraining. | unverified | 82.7 | 2025 | Source ↗ | Edit result |
| 31 | vit-b-16 Base ViT with ImageNet-21K pretraining. | paper | 81.2 | 2025 | Source ↗ | Edit result |
| 32 | ViT-B/16 Base ViT with ImageNet-21K pretraining. | unverified | 81.2 | 2025 | Source ↗ | Edit result |
| 33 | ResNet-50 (A3 training) ResNet Strikes Back. Modern training recipe on classic architecture. | unverified | 80.4 | 2025 | Source ↗ | Edit result |
| 34 | resnet-50-a3 ResNet Strikes Back. Modern training recipe on classic architecture. | paper | 80.4 | 2025 | Source ↗ | Edit result |
| 35 | resnet-152 10-crop evaluation. Original deep residual network. | paper | 78.6 | 2025 | Source ↗ | Edit result |
| 36 | efficientnet-b0 Only 5.3M parameters. Baseline for compound scaling. | paper | 77.1 | 2025 | Source ↗ | Edit result |
| 37 | resnet-50 Standard torchvision baseline. 25M parameters. | paper | 76.15 | 2025 | Source ↗ | Edit result |