Home/OCR/Benchmarks/imagenet-1k

imagenet-1k

Unknown

OCR benchmark

View on AlphaXiv

16

Total Results

16

Models Tested

1

Metrics

2025-12-21

Last Updated

top-1-accuracy

Higher is better

Rank	Model	Score	Source
1	coca-finetuned Current SOTA on ImageNet-1K. 2.1B parameters. Contrastive Captioner architecture.	91	google-research
2	vit-g-14 Giant ViT variant. 1.8B parameters.	90.45	google-research
3	convnext-v2-huge Best pure ConvNet. 650M parameters. Trained with FCMAE.	88.9	meta-research
4	vit-h-14 Huge ViT variant. 632M parameters.	88.55	google-research
5	swin-large Hierarchical Vision Transformer with shifted windows.	87.3	microsoft-research
6	efficientnet-v2-l Pretrained on ImageNet-21K, fine-tuned on 1K.	85.7	google-research
7	deit-b-distilled Data-efficient ViT with distillation. Trained on ImageNet-1K only.	85.2	meta-research
8	efficientnet-b7 8.4x smaller than GPipe. 66M parameters.	84.4	google-research
9	deit-b Without distillation. Trained from scratch on ImageNet-1K.	83.1	meta-research
10	convnext-v2-tiny 28M parameters. Efficient variant.	83	meta-research
11	vit-l-16 Large ViT with ImageNet-21K pretraining.	82.7	google-research
12	vit-b-16 Base ViT with ImageNet-21K pretraining.	81.2	google-research
13	resnet-50-a3 ResNet Strikes Back. Modern training recipe on classic architecture.	80.4	timm-research
14	resnet-152 10-crop evaluation. Original deep residual network.	78.6	microsoft-research
15	efficientnet-b0 Only 5.3M parameters. Baseline for compound scaling.	77.1	google-research
16	resnet-50 Standard torchvision baseline. 25M parameters.	76.15	pytorch-vision

Explore More OCR Content

Verified Model Reviews

Comparisons & Guides

View all 26 OCR benchmarks →

Back to All Benchmarks