How we measure silicon.
Most hardware charts cherry-pick a batch size, a quantisation, or a thermal envelope. Ours are built from three ordinary rules.
First, repeatable workloads. Each benchmark names the exact model revision, the exact quantisation (FP16, INT8, INT4 GPTQ/AWQ), the batch size, and the context length. Everything is reproducible from a single container.
Second, matched quantisation. A 5090 at INT4 versus an H100 at FP16 is not a comparison — it's two different models. Cross-card numbers in the same row always run at the same precision.
Third, latency with p95. Throughput is the headline; the footnote is what the slowest one-in-twenty request looks like. We report both, in the same row.
Cloud prices are verified directly with the provider on the date printed at the top of the issue. Vendor TFLOPS are dense FP16 from the published datasheet — not sparse, not "effective," not marketing.