Codesota · OCR · Benchmark · MVTec AD11 scored runs · 11 distinct modelsUpdated 2026-04-20
§ 00 · Opening

Visual anomaly detection, the industrial gold standard.

MVTec AD is the reference benchmark for unsupervised visual anomaly detection in industrial inspection. It judges systems on Image AUROC — how cleanly the detector separates defective from pristine parts on a held-out test set of real factory imagery.

§ 01 · Leaderboard · Image AUROC

Image AUROC, ranked.

Area under ROC curve for the image-level defective-vs-pristine decision, averaged over 15 categories. (higher is better)

#ModelImage AUROCVerifiedSource
01SimpleNet
Fetched from CodeSOTA API on 2026-04-20
99.6codesota-api
02PatchCore
Fetched from CodeSOTA API on 2026-04-20
99.1codesota-api
03EfficientAD
Fetched from CodeSOTA API on 2026-04-20
99.1codesota-api
Fig · 3 results on Image AUROC. Rows sourced from benchmarks.json; shaded row marks current SOTA.
§ 02 · Leaderboard · AUROC (lowercase metric tag)

AUROC (lowercase metric tag), ranked.

Same quantity as Image AUROC, recorded under the lowercase tag that some submissions use. (higher is better)

#ModelAUROC (lowercase metric tag)VerifiedSource
01simplenet
Fetched from CodeSOTA API on 2026-04-20
99.6codesota-api
02fastflow
Fetched from CodeSOTA API on 2026-04-20
99.4codesota-api
03patchcore
Fetched from CodeSOTA API on 2026-04-20
99.1codesota-api
04efficientad
Fetched from CodeSOTA API on 2026-04-20
99.1codesota-api
05reverse-distillation
Fetched from CodeSOTA API on 2026-04-20
98.5codesota-api
06cflow-ad
Fetched from CodeSOTA API on 2026-04-20
98.3codesota-api
07draem
Fetched from CodeSOTA API on 2026-04-20
98.0codesota-api
08padim
Fetched from CodeSOTA API on 2026-04-20
97.9codesota-api
Fig · 8 results on AUROC (lowercase metric tag). Rows sourced from benchmarks.json; shaded row marks current SOTA.
§ What it measures

Image AUROC, near the ceiling.

Image AUROC measures the area under the Receiver Operating Characteristic curve for the binary defective-versus-pristine decision at the image level. A perfect detector scores 100; the current SOTA sits at 99.6, which is why the leaderboard is now effectively saturated and the interesting research has moved to pixel-level AUROC and PRO.

Because scores cluster in the 97–99 range, small differences are not noise. A detector that gives up a tenth of a point is giving up detections on thousands of real inspections per shift.

§ Dataset details

Real factory images, real defects.

MVTec AD was released by MVTec Software GmbH. The dataset contains real industrial-inspection images across 15 object and texture categories — bottles, cables, capsules, carpet, grid, hazelnut, leather, metal nut, pill, screw, tile, toothbrush, transistor, wood, zipper — with pixel-accurate annotations of defect regions.

Models train on defect-free examples only; the test split contains both normal and defective samples, and the score is the mean over the 15 categories.

§ How scores are verified

Reported, then reproduced.

Each row above is a reported Image AUROC from the submitting paper or repository. Values here are preserved verbatim — where a paper reports different numbers under different inference settings (e.g. with or without test-time augmentation), the row reflects the best reported figure the authors stand behind.

Full policy: /methodology.

§ Final · Related OCR benchmarks

Cross-links, sibling leaderboards.