CIFAR-10
Unknown
60K 32x32 color images in 10 classes. Classic small-scale image classification benchmark with 50K training and 10K test images.
Benchmark Stats
SOTA History
accuracy
accuracy
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | ViT-H/14 (JFT-300M) Vision Transformer ViT-H/14, pre-trained on JFT-300M and fine-tuned on CIFAR-10. 99.50 ± 0.06% reported in ViT paper Table 2 (appendix). This is the published state-of-the-art on CIFAR-10 as of 2025 (PapersWithCode SOTA). Paper: Dosovitskiy et al. 2021, ICLR 2021, arxiv:2010.11929. | Community | 99.5 | 2026 | Source |
| 2 | ViT-L/16 (JFT-300M) Vision Transformer ViT-L/16, pre-trained on JFT-300M and fine-tuned on CIFAR-10. 99.42% reported in ViT paper Table 2 (appendix). Paper: Dosovitskiy et al. 2021, ICLR 2021, arxiv:2010.11929. | Community | 99.42 | 2026 | Source |
| 3 | BiT-L (ResNet152x4) Big Transfer (BiT) ResNet-152x4 large upstream variant, pre-trained on JFT-300M and fine-tuned on CIFAR-10. 99.37% reported in BiT paper Table 1. Paper: Kolesnikov et al., ECCV 2020, arxiv:1912.11370. | Community | 99.37 | 2026 | Source |
| 4 | ViT-H/14 (IN-21K) Vision Transformer ViT-H/14, pre-trained on ImageNet-21K and fine-tuned on CIFAR-10. 99.27% reported in ViT paper Table B1 (appendix). Paper: Dosovitskiy et al. 2021, ICLR 2021, arxiv:2010.11929. | Community | 99.27 | 2026 | Source |
| 5 | deit-b-distilled Near-SOTA on CIFAR-10 with transfer learning. | Editorial | 99.1 | 2025 | Source |
| 6 | ViT-L/16 (IN-21K) Vision Transformer ViT-L/16, pretrained on ImageNet-21K and finetuned on CIFAR-10. 99.0% reported in ViT paper. Paper: Dosovitskiy et al. 2021, arxiv:2010.11929. | Community | 99 | 2026 | Source |
| 7 | EfficientNet-B8 (NoisyStudent) NoisyStudent EfficientNet-B8 trained with self-training and noise. 98.7% on CIFAR-10. Paper: Xie et al. 2020, arxiv:1911.04252. | Community | 98.7 | 2026 | Source |
| 8 | convnext-v2-base Strong CNN performance on small-scale benchmark. | Editorial | 98.7 | 2025 | Source |
| 9 | ViT-B/16 (IN-21K) Vision Transformer ViT-B/16, pretrained on ImageNet-21K and finetuned on CIFAR-10. 98.13% reported in ViT paper. Paper: Dosovitskiy et al. 2021, arxiv:2010.11929. | Community | 98.13 | 2026 | Source |
| 10 | Swin-B Swin Transformer Base, finetuned from IN-21K pretraining on CIFAR-10. Paper: Liu et al. 2021, arxiv:2103.14030. | Community | 98 | 2026 | Source |
| 11 | resnet-50 With Cutout augmentation. | Editorial | 96.01 | 2025 | Source |