Industrial VisionPractical Guide2026

Anomaly Detection for ManufacturingFrom MVTec benchmarks to production inspection lines

Manual visual inspection catches 80% of defects on a good day. Trained anomaly detection models hit 99.8%. This guide compares six leading approaches, shows you how to deploy them, and gives you the numbers to build the business case.

March 2026|22 min read|6 models compared

TL;DR

  • Best overall: EfficientAD (99.8% AUROC, real-time speed) for most production lines
  • Best accuracy per sample: PatchCore (99.6% with as few as 10 normal images)
  • Zero-shot option: WinCLIP if you have zero training images, but expect a 4-5% accuracy gap
  • Need explainability? AnomalyGPT gives natural language defect descriptions but is 10-20x slower
  • Deployment: ONNX export to Jetson for edge; FastAPI + ONNX Runtime for centralized

Why Manufacturing Needs Anomaly Detection

~$3.1T
Annual cost of poor quality in manufacturing (ASQ estimate)
80%
Typical human inspector accuracy under sustained workload
99.8%
Best-in-class ML model accuracy on MVTec AD benchmark

Traditional quality control relies on rule-based machine vision (thresholding, template matching) or human inspectors. Both break down as product complexity increases:

  • Rule-based vision requires explicit programming for every defect type. A new scratch pattern means a new rule. Natural variation in materials triggers false positives.
  • Human inspectors fatigue after 20-30 minutes of sustained attention. Accuracy drops from ~95% to ~75% over a shift. They cannot inspect at line speeds of 1,000+ units/hour.
  • Supervised ML (classification/segmentation) works well but requires labeled defect images. In manufacturing, defects are rare by design. You might see 1 defective part per 1,000 -- not enough to train a classifier.

Anomaly detection solves this by learning only from normal images. The model learns what "good" looks like, then flags anything that deviates. No defect labels needed.

The MVTec AD Benchmark

MVTec Anomaly Detection (MVTec AD) is the standard benchmark for unsupervised anomaly detection in industrial inspection. Released in 2019 by MVTec Software GmbH, it contains 5,354 high-resolution images across 15 categories of real-world industrial products and textures.

CategoryTypeTrain (Normal)Test (Normal)Test (Anomalous)Defect Types
Bottleobject2092063Broken large, broken small, contamination
Cableobject2245892Bent wire, cable swap, cut inner/outer, missing cable, poke
Capsuleobject21923109Crack, faulty imprint, poke, scratch, squeeze
Carpettexture2802889Color, cut, hole, metal contamination, thread
Hazelnutobject3914070Crack, cut, hole, print
Leathertexture2453292Color, cut, fold, glue, poke
Metal Nutobject2202293Bent, color, flip, scratch
Pillobject26726141Color, combined, contamination, crack, faulty imprint, scratch, type
Screwobject32041119Manipulated front, scratch head/neck, thread side/top
Tiletexture2303384Crack, glue strip, gray stroke, oil, rough
Toothbrushobject601230Defective
Transistorobject2136040Bent lead, cut lead, damaged case, misplaced
Woodtexture2471960Color, combined, hole, liquid, scratch
Zipperobject24032119Broken teeth, combined, fabric border, fabric interior, rough, split teeth, squeezed teeth
Gridtexture2642157Bent, broken, glue, metal contamination, thread

Image-level AUROC

Binary classification: is this image normal or anomalous? Measured as Area Under the ROC Curve. A score of 99.8% means the model almost perfectly separates good parts from defective ones.

Pixel-level AUROC

Localization: can the model pinpoint where the defect is? Each pixel is scored as normal or anomalous. Critical for operators who need to see exactly what went wrong.

Model Comparison on MVTec AD

All scores are mean AUROC (%) across 15 MVTec AD categories. FPS measured on NVIDIA A100 at 256x256 resolution.

ModelYearImage AUROCPixel AUROCFPSApproachTraining Data
PatchCore202299.6%98.1%~5-12Memory bank + k-NNFew normal samples
EfficientAD202499.8%98.8%~50-80Student-teacher + autoencoderNormal samples only
SimpleNet202399.6%98.1%~70-85Feature adaptor + discriminatorNormal samples only
DRAEM202198.0%97.3%~25-40Reconstruction + synthetic anomaliesNormal + synthetic defects
AnomalyGPT202496.3%95.2%~2-5LVL model + in-context learningZero-shot or few-shot
WinCLIP202395.2%93.8%~15-25CLIP + window-based scoringZero-shot (text prompts)

PatchCore

2022

Towards Total Recall in Industrial Anomaly Detection

Image AUROC
99.6%
Pixel AUROC
98.1%
Speed
~5-12 FPS
Strength

High accuracy with minimal data, simple to deploy

Weakness

Memory grows linearly with coreset size; slow at scale

EfficientAD

2024

EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies

Image AUROC
99.8%
Pixel AUROC
98.8%
Speed
~50-80 FPS
Strength

Best accuracy-speed tradeoff; real-time capable

Weakness

Requires careful hyperparameter tuning per category

SimpleNet

2023

SimpleNet: A Simple Network for Image Anomaly Detection and Localization

Image AUROC
99.6%
Pixel AUROC
98.1%
Speed
~70-85 FPS
Strength

Extremely fast inference; lightweight architecture

Weakness

Slightly lower pixel-level localization on textures

DRAEM

2021

DRAEM: A Discriminatively Trained Anomaly Detection Model

Image AUROC
98%
Pixel AUROC
97.3%
Speed
~25-40 FPS
Strength

Generates its own training anomalies; no real defect data needed

Weakness

Synthetic anomalies may not match real defect distributions

AnomalyGPT

2024

AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

Image AUROC
96.3%
Pixel AUROC
95.2%
Speed
~2-5 FPS
Strength

Natural language explanations of defects; zero-shot capable

Weakness

Slow inference; requires large GPU; lower raw accuracy

WinCLIP

2023

WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation

Image AUROC
95.2%
Pixel AUROC
93.8%
Speed
~15-25 FPS
Strength

No training images needed at all; prompt-based

Weakness

Accuracy gap vs trained methods; struggles with subtle defects

Zero-Shot vs Few-Shot Approaches

The biggest practical question in manufacturing AD: how many normal images do you need?

Zero-Shot

0 images

Models like WinCLIP use text prompts ("a photo of a damaged bottle") and vision-language pretraining. No product-specific training.

Typical accuracy:
93-95% Image AUROC

Few-Shot (1-16 images)

1-16 images

PatchCore with 2-4 shot achieves ~97% AUROC. AnomalyGPT with in-context examples reaches ~96%. Practical for new product onboarding.

Typical accuracy:
96-98% Image AUROC

Full Training (200+ images)

200+ images

Standard unsupervised training with full normal dataset. EfficientAD and PatchCore both exceed 99.5% AUROC with adequate normal samples.

Typical accuracy:
99-99.8% Image AUROC

Practical recommendation

Start with zero-shot (WinCLIP) to validate the concept. Collect 10-50 normal images from the line and switch to PatchCore for a quick accuracy boost. Once you have 200+ images (usually 1-2 days of production), train EfficientAD for the final deployment. This staged approach lets you demonstrate value within days while building toward peak accuracy.

Deployment: Edge vs Cloud

Where you run inference matters as much as which model you pick. Latency budgets, data sovereignty, and cost structure all depend on deployment topology.

Edge (NVIDIA Jetson / Hailo)

Latency: <20ms
Cost: $200-800 per unit

Best for: High-speed lines, air-gapped facilities

Pros
  • + No network dependency
  • + Lowest latency
  • + Data stays on-premise
  • + Scales with line count
Cons
  • - Limited model size
  • - Harder to update models
  • - Per-unit hardware cost

On-premise GPU Server

Latency: 20-50ms
Cost: $5K-15K one-time

Best for: Multi-line facilities, mixed workloads

Pros
  • + Full model flexibility
  • + Centralized management
  • + Shared across lines
  • + Easy model updates
Cons
  • - Network latency to cameras
  • - Single point of failure
  • - Upfront capex

Cloud (AWS/GCP)

Latency: 50-200ms
Cost: $0.50-2/hr GPU

Best for: Prototyping, low-volume, multi-site aggregation

Pros
  • + No hardware investment
  • + Auto-scaling
  • + Latest models available
  • + Central dashboard
Cons
  • - Network dependency
  • - Data privacy concerns
  • - Ongoing opex
  • - Not viable for high-speed lines

Code: From Training to Production

Complete workflow using anomalib, ONNX Runtime, and FastAPI.

patchcore_train.pyTrain PatchCore on MVTec AD
# PatchCore inference with anomalib (v2)
from anomalib.models import Patchcore
from anomalib.engine import Engine
from anomalib.data import MVTec

# 1. Configure model
model = Patchcore(
    backbone="wide_resnet50_2",
    layers_to_extract=["layer2", "layer3"],
    coreset_sampling_ratio=0.1,  # 10% of patches → memory bank
    num_neighbors=9,
)

# 2. Load MVTec AD dataset (or your custom dataset)
datamodule = MVTec(
    root="./datasets/MVTec",
    category="bottle",
    image_size=(256, 256),
    train_batch_size=32,
    eval_batch_size=32,
)

# 3. Train (builds memory bank from normal images)
engine = Engine(max_epochs=1)  # PatchCore needs only 1 epoch
engine.fit(model=model, datamodule=datamodule)

# 4. Evaluate
results = engine.test(model=model, datamodule=datamodule)
# Returns: image_AUROC, pixel_AUROC, F1, PRO score
export_and_inference.pyExport to ONNX and run on edge
# Export trained model to ONNX for edge deployment
import torch
from anomalib.deploy import ExportMode

# Export to ONNX (quantized INT8 for Jetson)
engine.export(
    model=model,
    export_mode=ExportMode.ONNX,
    export_root="./exported_models/bottle_patchcore",
)

# ── Inference on edge device ──
import onnxruntime as ort
import numpy as np
from PIL import Image

session = ort.InferenceSession(
    "bottle_patchcore/model.onnx",
    providers=["TensorrtExecutionProvider", "CUDAExecutionProvider"],
)

def inspect(image_path: str, threshold: float = 0.5):
    img = Image.open(image_path).resize((256, 256))
    input_array = np.array(img).astype(np.float32) / 255.0
    input_array = np.transpose(input_array, (2, 0, 1))  # HWC → CHW
    input_array = np.expand_dims(input_array, axis=0)

    outputs = session.run(None, {"input": input_array})
    anomaly_score = outputs[0][0]    # Scalar score
    anomaly_map = outputs[1][0]      # Pixel-level heatmap

    return {
        "is_defective": float(anomaly_score) > threshold,
        "confidence": float(anomaly_score),
        "heatmap": anomaly_map,
    }
api_server.pyProduction FastAPI wrapper
# Production API wrapper for anomaly detection
from fastapi import FastAPI, UploadFile
from contextlib import asynccontextmanager
import onnxruntime as ort
import numpy as np
from PIL import Image
import io, time

models: dict = {}

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Load models at startup
    models["bottle"] = ort.InferenceSession("models/bottle.onnx")
    models["capsule"] = ort.InferenceSession("models/capsule.onnx")
    models["metal_nut"] = ort.InferenceSession("models/metal_nut.onnx")
    yield
    models.clear()

app = FastAPI(title="Manufacturing AD API", lifespan=lifespan)

@app.post("/inspect/{product_type}")
async def inspect(product_type: str, file: UploadFile):
    if product_type not in models:
        return {"error": f"No model for {product_type}"}

    t0 = time.perf_counter()
    img = Image.open(io.BytesIO(await file.read())).resize((256, 256))
    input_arr = np.expand_dims(
        np.transpose(np.array(img, dtype=np.float32) / 255.0, (2, 0, 1)),
        axis=0,
    )

    outputs = models[product_type].run(None, {"input": input_arr})
    latency_ms = (time.perf_counter() - t0) * 1000

    return {
        "product_type": product_type,
        "anomaly_score": float(outputs[0][0]),
        "is_defective": float(outputs[0][0]) > 0.5,
        "latency_ms": round(latency_ms, 1),
    }

ROI: Building the Business Case

Cost savings come from three sources: reduced manual inspection labor, fewer escaped defects (warranty/recall costs), and higher throughput from automated inspection at line speed.

IndustryInspection RateDefect RateManual Cost/UnitAI Cost/UnitAnnual SavingsPayback
PCB Assembly1,200 units/hr2-5%$0.08$0.002$340K4 mo
Automotive Parts300 units/hr0.5-1%$0.25$0.01$520K3 mo
Pharmaceutical5,000 units/hr0.1-0.3%$0.04$0.001$890K2 mo
Textile / Fabric50 m/min3-8%$0.15$0.005$210K6 mo

Key cost drivers

  • Hardware: NVIDIA Jetson Orin Nano ($200) handles EfficientAD at 50+ FPS. One unit per camera. Amortized over 3-5 years.
  • Integration: Camera mounting, lighting control, and PLC integration typically cost 2-3x the hardware. Budget $2K-5K per inspection station.
  • Maintenance: Model retraining when products change. Budget 2-4 hours of ML engineering per product variant per quarter.
  • Hidden savings: Automated inspection generates defect data that feeds back into process engineering. Teams report 15-30% reduction in defect generationwithin 6 months of deployment.

Which Model Should You Use?

Scenario
High-speed production line, stable products
Use
EfficientAD
Why
Best accuracy at real-time speed. Train once per product, deploy to edge.
Scenario
Frequent product changeovers, small batches
Use
PatchCore (few-shot)
Why
Reaches 97%+ AUROC with just 4-10 images. Fastest time to deployment.
Scenario
New facility, no training data yet
Use
WinCLIP (zero-shot) then migrate
Why
Start inspecting immediately with text prompts. Collect data for PatchCore/EfficientAD.
Scenario
Regulated industry needing defect explanations
Use
AnomalyGPT
Why
Generates natural language reports. Slower but provides auditable defect descriptions.
Scenario
Resource-constrained edge devices
Use
SimpleNet
Why
Lightest model with strong accuracy. Runs on low-power edge hardware.
Scenario
Complex defect patterns, textured surfaces
Use
DRAEM
Why
Synthetic anomaly generation handles texture categories well.

Related Resources