Codesota · Embedded AI · Vol. IIThe register of edge silicon — every watt, every TOPIssue: April 22, 2026

§ 00 · Embedded AI

On the edge, measured.

The independent register for NPUs that leave the datacenter behind — Hailo-8, 8L, 10H, 15H first, with Jetson Orin, Coral, Rockchip, Qualcomm and the Apple Neural Engine held next to them. Numbers taken from the public Hailo Model Zoo — INT8 (or INT4 for LLMs on 10H), batch 1, on reference boards.

5 Hailo chips covered · 21 benchmarked models · 9 detection / seg variants · 3 on-device LLMs on Hailo-10H.

Chip catalog →Model zoo throughput Every LLM on Hailo-10H →

§ 01 · Hailo chips

Seven watts, to forty TOPS.

Hailo builds NPUs that keep all model memory on die — no external DRAM lookups during inference. That decision buys predictable latency and very high perf-per-watt; it also means the model has to fit, or be partitioned by the Hailo compiler.

Family: Accelerator · Vision Processor
Quantisation: INT8 default · INT4 for LLMs on 10H
Runtime: HailoRT · x86 · ARM · Raspberry Pi
Compile: Dataflow Compiler → .hef

Five chips · Hailo lineup

Shaded row marks the generative-AI flagship

Chip	Family	Performance	Power	Form	Best for	Released	Status
Hailo-8L	Accelerator	13 TOPS (INT8)	~1.5 W typical	M.2 / PCIe	Cost-sensitive edge: single-stream detection, smart home, POS	2023	Shipping
Hailo-8	Accelerator	26 TOPS (INT8)	~2.5 W typical	M.2 / PCIe / SoM	Multi-stream CV: smart cameras, retail analytics, Raspberry Pi 5 AI kit	2021	Shipping
Hailo-10H	Accelerator	40 TOPS (INT4)	~2.5 W typical	M.2	On-device LLMs/VLMs, Llama 3 8B at 10+ tok/s, generative edge AI	2025-07	New
Hailo-15H	Vision Processor	20 TOPS (INT8)	~3-5 W	SoC (VPU)	High-end smart cameras with on-chip ISP + NN core	2023	Shipping
Hailo-15L	Vision Processor	7 TOPS (INT8)	~2 W	SoC (VPU)	Mass-market IP cameras replacing traditional SoCs	2024	Mass market

Fig 1 · Hailo-10H launched July 2025 as the first Hailo accelerator with on-chip generative-AI capability. Hailo-8 remains the workhorse for multi-stream CV — it ships in the official Raspberry Pi 5 AI Kit.

§ 02 · Competitor silicon

The wider edge shelf.

Hailo is the primary lens of this register, but the edge shelf is wider. Jetson Orin owns the CUDA-native robotics lane; Google Coral is the old reliable for TFLite classifiers; Rockchip and Qualcomm ship inside SBCs and phones; the Apple Neural Engine quietly runs CoreML on every M-series machine.

Numbers from vendor product pages. Direct head-to-head is an active research frontier — workloads, quantisation, batch size and toolchains all shift the answer.

Six competitor chips · reference only

Held against Hailo, not ranked

Chip	Vendor	Performance	Power envelope	Role
Jetson Orin Nano (8GB)	NVIDIA	40 TOPS (INT8)	7–15 W	CUDA-capable edge dev kit — robotics, vision, early LLM
Jetson AGX Orin (64GB)	NVIDIA	275 TOPS (INT8)	15–60 W	Autonomous machines · large VLMs on the edge
Coral Edge TPU (USB / M.2)	Google	4 TOPS (INT8)	~2 W	TFLite-only · classic classifier & detector workloads
Rockchip RK3588 NPU	Rockchip	6 TOPS (INT8)	SoC envelope	SBC / mini-PC workhorse · ARM + NPU + GPU
Qualcomm QCS8550	Qualcomm	~48 TOPS (INT8)	SoC envelope	Hexagon NPU · IoT, robotics, Android-class devices
Apple Neural Engine (M-series)	Apple	~18–38 TOPS (INT8)	SoC envelope	macOS / iOS on-device inference · CoreML

Fig 2 · Vendor-reported TOPS. For a deeper silicon register see /hardware — datacenter and consumer GPUs.

§ 03 · Benchmarks summary

Eight workloads. One chip at a time.

The same chip is fast at vision and slow at OCR; Hailo-10H runs an 8B LLM at decode-interactive speed and, in the same envelope, clears 275 FPS on YOLOv8n. Each panel ends at a real number from the Hailo Model Zoo.

Vision on edge

275YOLOv8n · FPS ↑

Detection (newest)

250YOLO26n · FPS ↑

Classification

1,750ResNet-50 · FPS ↑

OCR

65PaddleOCR-v5 · FPS ↑

LLM on 10H · 3B

28Llama 3.2 · tok/s ↑

LLM on 10H · 8B

11Llama 3.1 · tok/s ↑

Image embeddings

42CLIP L/14 · FPS ↑

Face recognition

890ArcFace R50 · FPS ↑

Fig 3 · Dot marks the figure published in the Hailo Model Zoo. Trend is directional — architectural lineage, not a per-submission history.

§ 04 · Model catalog

Pre-compiled, per chip.

FPS from the public Hailo Model Zoo. INT8, batch 1, reference boards. LLM rows are INT4 decode tok/s with 2K context on Hailo-10H. A dash means the model is not officially compiled for that chip.

Detection

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
YOLOv11n	640×640	2.6M	135	210	260	240	Latest YOLO nano, NMS on-chip
YOLOv11s	640×640	9.4M	72	140	175	160	Balanced accuracy/speed
YOLOv11m	640×640	20.1M	38	70	95	85	Higher mAP for demanding scenes
YOLOv8n	640×640	3.2M	150	235	275	255	Most deployed edge detector
YOLOv8s	640×640	11.2M	78	150	180	165

Detection / Seg

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
YOLO26n	640×640	3.0M	—	—	250	230	NMS-free, newest family

Oriented BBox

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
YOLOv11n-obb	640×640	2.7M	—	—	210	195	Rotated boxes for aerial/industrial

Instance Seg

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
YOLOv8n-seg	640×640	3.4M	85	155	190	175
YOLOv5n-seg-hpp	640×640	2.0M	120	195	230	215	HailoRT-accelerated post-process

Pose

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
YOLOv8n-pose	640×640	3.3M	88	160	195	180	17-keypoint human pose

Classification

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
ResNet-50	224×224	25.6M	720	1,390	1,750	1,500	ImageNet reference
MobileNet V3	224×224	5.4M	1,600	2,800	3,400	3,100	Fastest production classifier
EfficientNet-B0	224×224	5.3M	1,020	1,850	2,300	2,050

OCR

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
PaddleOCR-v5 (det+rec)	Multi	~12M	22	45	65	58	Latest PP-OCR pipeline

Face Detection

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
RetinaFace MobileNet	736×1280	0.4M	85	140	165	150

Face Recognition

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
ArcFace R50	112×112	43.6M	380	720	890	800

Monocular Depth

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
FastDepth	224×224	3.9M	380	640	790	710

Embeddings

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
CLIP ViT-L/14 (Laion2B)	224×224	304M	—	—	42	28	Image embeddings for retrieval

LLM (INT4)

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
Llama 3.2 3B	—	3.2B	—	—	28	—	tok/s decode, 2K ctx
Llama 3.1 8B	—	8.0B	—	—	11	—	tok/s decode, 2K ctx
Qwen 2.5 1.5B	—	1.5B	—	—	45	—	tok/s decode

Fig 4 · Frames per second, except LLM (INT4) rows — those are tokens per second at decode. Source of truth is the upstream Model Zoo; we republish for legibility only.

§ 05 · Sub-pages

Deeper reads, per chip family.

Each sub-page carries its own evidence table. This hub is the index; the work is below.

Deep-dive

Every LLM on Hailo-10H

Llama 3.1 8B · Llama 3.2 3B · Qwen 2.5 · Qwen3 — decode tok/s, quality, quantisation.

Read the deep-dive →

Fig 5 · Sub-pages land one chip family at a time. Per-model HEF pages are the next beachhead — one URL per model × chip combo, with SHA256 and calibration notes.

§ 06

What is HEF?

The binary that actually runs.

HEF — Hailo Executable Format — is the compiled binary that runs on a Hailo chip. You cannot load a PyTorch or ONNX model directly. The Hailo Dataflow Compiler converts it, quantises the weights, maps operations onto the NPU's cores and memory, and produces a single .hef file.

The compile step takes minutes to hours and needs a licence. Quantisation quality depends on calibration data — bad calibration, bad accuracy. Each chip has its own HEF; a Hailo-8 HEF does not run on a 10H. Most deployments want "give me YOLOv11n for Hailo-8, verified" — not a compile pipeline.

Pipeline: ONNX → hailo parser → HAR → hailo optimize (with calibration set) → quantised HAR → hailo compile → .hef → HailoRT → ConfiguredNetworkGroup → inference.

§ 07 · Related

Neighbouring registers.

Hub pages across Codesota worth reading next.

Hardware · GPU register →

Consumer and datacenter accelerators, priced honestly.

LLM · register →

Frontier language models — the cloud counterpart to 10H LLMs.

Vision · router →

Detection, classification and seg benchmarks across backbones.

Speech · register →

ASR and TTS — the speech half of the edge pipeline.

Audio · register →

Audio classification, captioning, music generation.

All benchmarks →

Every tracked task across every modality.

Methodology →

How Codesota reproduces and publishes results.

Benchmarks sourced from the public Hailo Model Zoo and Hailo datasheets. Page maintained by CodeSOTA.