Home/OCR/Benchmarks/imagenet-1k

imagenet-1k

Unknown

OCR benchmark

16
Total Results
16
Models Tested
1
Metrics
2025-12-21
Last Updated

top-1-accuracy

Higher is better

RankModelScoreSource
1coca-finetuned

Current SOTA on ImageNet-1K. 2.1B parameters. Contrastive Captioner architecture.

91google-research
2vit-g-14

Giant ViT variant. 1.8B parameters.

90.45google-research
3convnext-v2-huge

Best pure ConvNet. 650M parameters. Trained with FCMAE.

88.9meta-research
4vit-h-14

Huge ViT variant. 632M parameters.

88.55google-research
5swin-large

Hierarchical Vision Transformer with shifted windows.

87.3microsoft-research
6efficientnet-v2-l

Pretrained on ImageNet-21K, fine-tuned on 1K.

85.7google-research
7deit-b-distilled

Data-efficient ViT with distillation. Trained on ImageNet-1K only.

85.2meta-research
8efficientnet-b7

8.4x smaller than GPipe. 66M parameters.

84.4google-research
9deit-b

Without distillation. Trained from scratch on ImageNet-1K.

83.1meta-research
10convnext-v2-tiny

28M parameters. Efficient variant.

83meta-research
11vit-l-16

Large ViT with ImageNet-21K pretraining.

82.7google-research
12vit-b-16

Base ViT with ImageNet-21K pretraining.

81.2google-research
13resnet-50-a3

ResNet Strikes Back. Modern training recipe on classic architecture.

80.4timm-research
14resnet-152

10-crop evaluation. Original deep residual network.

78.6microsoft-research
15efficientnet-b0

Only 5.3M parameters. Baseline for compound scaling.

77.1google-research
16resnet-50

Standard torchvision baseline. 25M parameters.

76.15pytorch-vision

Explore More OCR Content