New section · April 2026

Embedded AI
starting with Hailo

The independent catalog for edge AI on Hailo accelerators. Chip lineup, Model Zoo benchmarks, and pre-compiled HEF files for YOLOv11, YOLO26, PaddleOCR, CLIP, and on-device Llama 3.

Our goal: be the Hugging Face for Hailo — one page per model, one click to a verified HEF for your chip.

What’s here

5
Hailo chips covered
21
Benchmarked models
9
Detection / seg variants
3
On-device LLMs (Hailo-10H)

Why Hailo for edge

Hailo builds NPUs that integrate all model memory directly on die — no external DRAM lookups during inference. That design hits very high perf-per-watt, which is why Hailo-8 ships inside the official Raspberry Pi 5 AI Kit and why Hailo-10H can run a 3B LLM on 2.5W.

Integrated memory

Weights and activations stay on-die. Predictable latency, no DRAM thrash, low power. The trade-off: models must fit or be partitioned by the Hailo compiler.

Fixed-point at deploy

Every model is quantized (INT8, or INT4 for LLMs on 10H) and compiled into an HEF binary. You train in PyTorch / ONNX, then run the Dataflow Compiler to produce the HEF.

Linux-first stack

HailoRT runs on x86, ARM, and Raspberry Pi. GStreamer, Python, C++ bindings. No CUDA, no drivers hell — a kernel module and a userspace runtime.

Hailo chip lineup

From 7 TOPS vision SoCs to 40 TOPS generative-AI accelerators. Pick the chip, then pick the model below.

ChipFamilyPerformancePowerForm factorBest forStatus
Hailo-8LAccelerator13 TOPS (INT8)~1.5 W typicalM.2 / PCIeCost-sensitive edge: single-stream detection, smart home, POSShipping
Hailo-8Accelerator26 TOPS (INT8)~2.5 W typicalM.2 / PCIe / SoMMulti-stream CV: smart cameras, retail analytics, Raspberry Pi 5 AI kitShipping
Hailo-10HAccelerator40 TOPS (INT4)~2.5 W typicalM.2On-device LLMs/VLMs, Llama 3 8B at 10+ tok/s, generative edge AINew
Hailo-15HVision Processor20 TOPS (INT8)~3-5 WSoC (VPU)High-end smart cameras with on-chip ISP + NN coreShipping
Hailo-15LVision Processor7 TOPS (INT8)~2 WSoC (VPU)Mass-market IP cameras replacing traditional SoCsMass market

Hailo-10H launched July 2025 as the first edge accelerator with on-chip generative-AI capability. Hailo-8 remains the workhorse for multi-stream computer vision.

Pre-compiled model catalog

FPS numbers are from the public Hailo Model Zoo benchmark tables — INT8, batch 1, on reference boards. LLM rows are decode throughput at INT4 on Hailo-10H with 2K context.

Detection

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
YOLOv11n640×6402.6M135210260240Latest YOLO nano, NMS on-chip
YOLOv11s640×6409.4M72140175160Balanced accuracy/speed
YOLOv11m640×64020.1M38709585Higher mAP for demanding scenes
YOLOv8n640×6403.2M150235275255Most deployed edge detector
YOLOv8s640×64011.2M78150180165

Detection / Seg

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
YOLO26n640×6403.0M250230NMS-free, newest family

Oriented BBox

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
YOLOv11n-obb640×6402.7M210195Rotated boxes for aerial/industrial

Instance Seg

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
YOLOv8n-seg640×6403.4M85155190175
YOLOv5n-seg-hpp640×6402.0M120195230215HailoRT-accelerated post-process

Pose

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
YOLOv8n-pose640×6403.3M8816019518017-keypoint human pose

Classification

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
ResNet-50224×22425.6M7201,3901,7501,500ImageNet reference
MobileNet V3224×2245.4M1,6002,8003,4003,100Fastest production classifier
EfficientNet-B0224×2245.3M1,0201,8502,3002,050

OCR

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
PaddleOCR-v5 (det+rec)Multi~12M22456558Latest PP-OCR pipeline

Face Detection

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
RetinaFace MobileNet736×12800.4M85140165150

Face Recognition

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
ArcFace R50112×11243.6M380720890800

Monocular Depth

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
FastDepth224×2243.9M380640790710

Embeddings

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
CLIP ViT-L/14 (Laion2B)224×224304M4228Image embeddings for retrieval

LLM (INT4)

ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
Llama 3.2 3B3.2B28tok/s decode, 2K ctx
Llama 3.1 8B8.0B11tok/s decode, 2K ctx
Qwen 2.5 1.5B1.5B45tok/s decode

Numbers under LLM columns are tokens/sec (decode). Everything else is frames/sec. A dash means the model isn’t officially compiled for that chip — usually because it exceeds the SRAM budget or the chip targets a different task class.

What is an HEF?

HEF (Hailo Executable Format) is the compiled binary that actually runs on a Hailo chip. You can’t load a PyTorch or ONNX model directly — the Hailo Dataflow Compiler converts it, quantizes the weights, maps ops onto the NPU’s cores and memory, and produces a single .hef file.

Compile pipeline

  1. Train or download model in ONNX / TF / PyTorch
  2. Run hailo parser → Hailo Archive (HAR)
  3. Run hailo optimize with a calibration set → quantized HAR
  4. Run hailo compile → HEF
  5. Load with HailoRT: .hef → ConfiguredNetworkGroup → inference

Why pre-compiled HEFs matter

  • Compile step takes minutes to hours and needs a license
  • Quantization results depend on calibration data — bad calibration, bad accuracy
  • Each chip has a different HEF (Hailo-8 HEF doesn’t run on 10H)
  • Most deployments want “give me YOLOv11n for Hailo-8, verified” — not a compile pipeline

Roadmap

This page is the MVP of a bigger plan — become the Hugging Face of Hailo-compiled models.

Now

Hailo chip + Model Zoo catalog

Chip spec table, per-task FPS benchmarks, links to the official Hailo Model Zoo source.

Next

Per-model pages with HEF downloads

One page per model × chip combo. Verified HEF binary, SHA256, calibration notes, latency numbers, example inference code.

Later

Compile-on-demand

Upload an ONNX model, pick a Hailo chip, get back a compiled HEF. Behind the scenes: our inference server runs the Hailo Dataflow Compiler.

Later

Other edge NPUs

Same catalog pattern for Google Coral (Edge TPU), NVIDIA Jetson, Rockchip RK3588 NPU, and Qualcomm QCS.

Resources

Benchmarks sourced from the public Hailo Model Zoo and Hailo datasheets. Page maintained by CodeSOTA. Last updated April 2026.