Codesota · OCR · Benchmark · MVTec AD11 scored runs · 11 distinct modelsUpdated 2026-04-20

§ 00 · Opening

Visual anomaly detection, the industrial gold standard.

MVTec AD is the reference benchmark for unsupervised visual anomaly detection in industrial inspection. It judges systems on Image AUROC — how cleanly the detector separates defective from pristine parts on a held-out test set of real factory imagery.

§ 01 · Leaderboard · Image AUROC

Image AUROC, ranked.

Area under ROC curve for the image-level defective-vs-pristine decision, averaged over 15 categories. (higher is better)

#	Model	Image AUROC	Verified	Source
01	SimpleNet Fetched from CodeSOTA API on 2026-04-20	99.6	—	codesota-api
02	PatchCore Fetched from CodeSOTA API on 2026-04-20	99.1	—	codesota-api
03	EfficientAD Fetched from CodeSOTA API on 2026-04-20	99.1	—	codesota-api

Fig · 3 results on Image AUROC. Rows sourced from benchmarks.json; shaded row marks current SOTA.

§ 02 · Leaderboard · AUROC (lowercase metric tag)

AUROC (lowercase metric tag), ranked.

Same quantity as Image AUROC, recorded under the lowercase tag that some submissions use. (higher is better)

#	Model	AUROC (lowercase metric tag)	Verified	Source
01	simplenet Fetched from CodeSOTA API on 2026-04-20	99.6	—	codesota-api
02	fastflow Fetched from CodeSOTA API on 2026-04-20	99.4	—	codesota-api
03	patchcore Fetched from CodeSOTA API on 2026-04-20	99.1	—	codesota-api
04	efficientad Fetched from CodeSOTA API on 2026-04-20	99.1	—	codesota-api
05	reverse-distillation Fetched from CodeSOTA API on 2026-04-20	98.5	—	codesota-api
06	cflow-ad Fetched from CodeSOTA API on 2026-04-20	98.3	—	codesota-api
07	draem Fetched from CodeSOTA API on 2026-04-20	98.0	—	codesota-api
08	padim Fetched from CodeSOTA API on 2026-04-20	97.9	—	codesota-api

Fig · 8 results on AUROC (lowercase metric tag). Rows sourced from benchmarks.json; shaded row marks current SOTA.

§ What it measures

Image AUROC, near the ceiling.

Image AUROC measures the area under the Receiver Operating Characteristic curve for the binary defective-versus-pristine decision at the image level. A perfect detector scores 100; the current SOTA sits at 99.6, which is why the leaderboard is now effectively saturated and the interesting research has moved to pixel-level AUROC and PRO.

Because scores cluster in the 97–99 range, small differences are not noise. A detector that gives up a tenth of a point is giving up detections on thousands of real inspections per shift.

§ Dataset details

Real factory images, real defects.

MVTec AD was released by MVTec Software GmbH. The dataset contains real industrial-inspection images across 15 object and texture categories — bottles, cables, capsules, carpet, grid, hazelnut, leather, metal nut, pill, screw, tile, toothbrush, transistor, wood, zipper — with pixel-accurate annotations of defect regions.

Models train on defect-free examples only; the test split contains both normal and defective samples, and the score is the mean over the 15 categories.

§ How scores are verified

Reported, then reproduced.

Each row above is a reported Image AUROC from the submitting paper or repository. Values here are preserved verbatim — where a paper reports different numbers under different inference settings (e.g. with or without test-time augmentation), the row reflects the best reported figure the authors stand behind.

Full policy: /methodology.

§ Final · Related OCR benchmarks