Codesota · Embedded AI · Vol. IIThe register of edge silicon — every watt, every TOPIssue: April 22, 2026
§ 00 · Embedded AI

On the edge, measured.

The independent register for NPUs that leave the datacenter behind — Hailo-8, 8L, 10H, 15H first, with Jetson Orin, Coral, Rockchip, Qualcomm and the Apple Neural Engine held next to them. Numbers taken from the public Hailo Model Zoo — INT8 (or INT4 for LLMs on 10H), batch 1, on reference boards.

5 Hailo chips covered · 21 benchmarked models · 9 detection / seg variants · 3 on-device LLMs on Hailo-10H.

§ 01 · Hailo chips

Seven watts, to forty TOPS.

Hailo builds NPUs that keep all model memory on die — no external DRAM lookups during inference. That decision buys predictable latency and very high perf-per-watt; it also means the model has to fit, or be partitioned by the Hailo compiler.


Family
Accelerator · Vision Processor
Quantisation
INT8 default · INT4 for LLMs on 10H
Runtime
HailoRT · x86 · ARM · Raspberry Pi
Compile
Dataflow Compiler → .hef
Five chips · Hailo lineup
Shaded row marks the generative-AI flagship
ChipFamilyPerformancePowerFormBest forReleasedStatus
Hailo-8LAccelerator13 TOPS (INT8)~1.5 W typicalM.2 / PCIeCost-sensitive edge: single-stream detection, smart home, POS2023Shipping
Hailo-8Accelerator26 TOPS (INT8)~2.5 W typicalM.2 / PCIe / SoMMulti-stream CV: smart cameras, retail analytics, Raspberry Pi 5 AI kit2021Shipping
Hailo-10HAccelerator40 TOPS (INT4)~2.5 W typicalM.2On-device LLMs/VLMs, Llama 3 8B at 10+ tok/s, generative edge AI2025-07New
Hailo-15HVision Processor20 TOPS (INT8)~3-5 WSoC (VPU)High-end smart cameras with on-chip ISP + NN core2023Shipping
Hailo-15LVision Processor7 TOPS (INT8)~2 WSoC (VPU)Mass-market IP cameras replacing traditional SoCs2024Mass market
Fig 1 · Hailo-10H launched July 2025 as the first Hailo accelerator with on-chip generative-AI capability. Hailo-8 remains the workhorse for multi-stream CV — it ships in the official Raspberry Pi 5 AI Kit.
§ 02 · Competitor silicon

The wider edge shelf.

Hailo is the primary lens of this register, but the edge shelf is wider. Jetson Orin owns the CUDA-native robotics lane; Google Coral is the old reliable for TFLite classifiers; Rockchip and Qualcomm ship inside SBCs and phones; the Apple Neural Engine quietly runs CoreML on every M-series machine.

Numbers from vendor product pages. Direct head-to-head is an active research frontier — workloads, quantisation, batch size and toolchains all shift the answer.

Six competitor chips · reference only
Held against Hailo, not ranked
ChipVendorPerformancePower envelopeRole
Jetson Orin Nano (8GB)NVIDIA40 TOPS (INT8)7–15 WCUDA-capable edge dev kit — robotics, vision, early LLM
Jetson AGX Orin (64GB)NVIDIA275 TOPS (INT8)15–60 WAutonomous machines · large VLMs on the edge
Coral Edge TPU (USB / M.2)Google4 TOPS (INT8)~2 WTFLite-only · classic classifier & detector workloads
Rockchip RK3588 NPURockchip6 TOPS (INT8)SoC envelopeSBC / mini-PC workhorse · ARM + NPU + GPU
Qualcomm QCS8550Qualcomm~48 TOPS (INT8)SoC envelopeHexagon NPU · IoT, robotics, Android-class devices
Apple Neural Engine (M-series)Apple~18–38 TOPS (INT8)SoC envelopemacOS / iOS on-device inference · CoreML
Fig 2 · Vendor-reported TOPS. For a deeper silicon register see /hardware — datacenter and consumer GPUs.
§ 03 · Benchmarks summary

Eight workloads. One chip at a time.

The same chip is fast at vision and slow at OCR; Hailo-10H runs an 8B LLM at decode-interactive speed and, in the same envelope, clears 275 FPS on YOLOv8n. Each panel ends at a real number from the Hailo Model Zoo.

Vision on edge
202326
275YOLOv8n · FPS ↑
Detection (newest)
202326
250YOLO26n · FPS ↑
Classification
202326
1,750ResNet-50 · FPS ↑
OCR
202326
65PaddleOCR-v5 · FPS ↑
LLM on 10H · 3B
202326
28Llama 3.2 · tok/s ↑
LLM on 10H · 8B
202326
11Llama 3.1 · tok/s ↑
Image embeddings
202326
42CLIP L/14 · FPS ↑
Face recognition
202326
890ArcFace R50 · FPS ↑
Fig 3 · Dot marks the figure published in the Hailo Model Zoo. Trend is directional — architectural lineage, not a per-submission history.
§ 04 · Model catalog

Pre-compiled, per chip.

FPS from the public Hailo Model Zoo. INT8, batch 1, reference boards. LLM rows are INT4 decode tok/s with 2K context on Hailo-10H. A dash means the model is not officially compiled for that chip.

Detection
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
YOLOv11n640×6402.6M135210260240Latest YOLO nano, NMS on-chip
YOLOv11s640×6409.4M72140175160Balanced accuracy/speed
YOLOv11m640×64020.1M38709585Higher mAP for demanding scenes
YOLOv8n640×6403.2M150235275255Most deployed edge detector
YOLOv8s640×64011.2M78150180165
Detection / Seg
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
YOLO26n640×6403.0M250230NMS-free, newest family
Oriented BBox
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
YOLOv11n-obb640×6402.7M210195Rotated boxes for aerial/industrial
Instance Seg
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
YOLOv8n-seg640×6403.4M85155190175
YOLOv5n-seg-hpp640×6402.0M120195230215HailoRT-accelerated post-process
Pose
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
YOLOv8n-pose640×6403.3M8816019518017-keypoint human pose
Classification
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
ResNet-50224×22425.6M7201,3901,7501,500ImageNet reference
MobileNet V3224×2245.4M1,6002,8003,4003,100Fastest production classifier
EfficientNet-B0224×2245.3M1,0201,8502,3002,050
OCR
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
PaddleOCR-v5 (det+rec)Multi~12M22456558Latest PP-OCR pipeline
Face Detection
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
RetinaFace MobileNet736×12800.4M85140165150
Face Recognition
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
ArcFace R50112×11243.6M380720890800
Monocular Depth
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
FastDepth224×2243.9M380640790710
Embeddings
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
CLIP ViT-L/14 (Laion2B)224×224304M4228Image embeddings for retrieval
LLM (INT4)
ModelInputParamsHailo-8LHailo-8Hailo-10HHailo-15HNotes
Llama 3.2 3B3.2B28tok/s decode, 2K ctx
Llama 3.1 8B8.0B11tok/s decode, 2K ctx
Qwen 2.5 1.5B1.5B45tok/s decode
Fig 4 · Frames per second, except LLM (INT4) rows — those are tokens per second at decode. Source of truth is the upstream Model Zoo; we republish for legibility only.
§ 05 · Sub-pages

Deeper reads, per chip family.

Each sub-page carries its own evidence table. This hub is the index; the work is below.

Fig 5 · Sub-pages land one chip family at a time. Per-model HEF pages are the next beachhead — one URL per model × chip combo, with SHA256 and calibration notes.
§ 06
What is HEF?

The binary that actually runs.

HEF — Hailo Executable Format — is the compiled binary that runs on a Hailo chip. You cannot load a PyTorch or ONNX model directly. The Hailo Dataflow Compiler converts it, quantises the weights, maps operations onto the NPU's cores and memory, and produces a single .hef file.

The compile step takes minutes to hours and needs a licence. Quantisation quality depends on calibration data — bad calibration, bad accuracy. Each chip has its own HEF; a Hailo-8 HEF does not run on a 10H. Most deployments want "give me YOLOv11n for Hailo-8, verified" — not a compile pipeline.

Pipeline: ONNX → hailo parser → HAR → hailo optimize (with calibration set) → quantised HAR → hailo compile → .hef → HailoRT → ConfiguredNetworkGroup → inference.

§ 07 · Related

Neighbouring registers.

Hub pages across Codesota worth reading next.

Hardware · GPU register
Consumer and datacenter accelerators, priced honestly.
LLM · register
Frontier language models — the cloud counterpart to 10H LLMs.
Vision · router
Detection, classification and seg benchmarks across backbones.
Speech · register
ASR and TTS — the speech half of the edge pipeline.
Audio · register
Audio classification, captioning, music generation.
All benchmarks
Every tracked task across every modality.
Methodology
How Codesota reproduces and publishes results.
Benchmarks sourced from the public Hailo Model Zoo and Hailo datasheets. Page maintained by CodeSOTA.