Who leads the KITAB-Bench benchmark?

Gemini 2.0 Flash currently leads KITAB-Bench with a score of 0.13 on Character Error Rate (lower is better).

What is the state-of-the-art score on KITAB-Bench?

The state-of-the-art result on KITAB-Bench is 0.13 (Character Error Rate), achieved by Gemini 2.0 Flash as of 2026.

How many models are tracked on KITAB-Bench?

Codesota tracks 18 models on KITAB-Bench.

When was the KITAB-Bench leaderboard last updated?

The KITAB-Bench leaderboard on Codesota includes results through 2026, with the earliest tracked result from 2025.

Codesota · Benchmark · KITAB-BenchHome/Leaderboards/Vision & Documents/Document OCR/KITAB-Bench

MBZUAI

KITAB-Bench.

Name: KITAB-Bench Benchmark Results
Creator: MBZUAI
Published: 2025-01-01
License: https://creativecommons.org/licenses/by/4.0/

8,809 Arabic text samples across 9 domains. Tests Arabic script recognition.

Paper ↗Leaderboard ↓Lineage

§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Character Error Rate

Levenshtein distance between predicted and ground truth (lower is better)

Lower is better

Trust tiers for Character Error Rateverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	Gemini 2.0 Flash Arabic OCR - Character Error Rate (lower is better). 8,809 samples, 9 domains	unverified	0.13	2025	Source ↗	Looks wrong?
02	gemini-20-flash Arabic OCR - Character Error Rate (lower is better). 8,809 samples, 9 domains	paper	0.13	2025	Source ↗	Looks wrong?
03	ain-7b	paper	0.20	2025	Source ↗	Looks wrong?
04	AIN 7B	unverified	0.20	2025	Source ↗	Looks wrong?
05	GPT-4o	unverified	0.31	2025	Source ↗	Looks wrong?
06	GPT-4o mini	unverified	0.43	2025	Source ↗	Looks wrong?
07	gpt-4o-mini	paper	0.43	2025	Source ↗	Looks wrong?
08	azure-ocr	paper	0.52	2025	Source ↗	Looks wrong?
09	Azure OCR	unverified	0.52	2025	Source ↗	Looks wrong?
10	tesseract	paper	0.54	2025	Source ↗	Looks wrong?
11	easyocr	paper	0.58	2025	Source ↗	Looks wrong?
12	PaddleOCR	unverified	0.79	2025	Source ↗	Looks wrong?
13	Gemma 3 Arabic OCR - Character Error Rate (lower is better). Gemma 3 on KITAB-Bench.	verified	1.05	2026	Source ↗	Looks wrong?
14	qwen2.5-vl-7b Arabic OCR - Character Error Rate (lower is better). Qwen2.5-VL-7B on KITAB-Bench.	verified	1.20	2026	Source ↗	Looks wrong?
15	Qwen2-VL 7B Arabic OCR - Character Error Rate (lower is better). Qwen2-VL-7B on KITAB-Bench 8,809 samples, 9 domains.	verified	1.48	2026	Source ↗	Looks wrong?
16	Qaari Arabic OCR - Character Error Rate (lower is better). Qaari specialized Arabic OCR model on KITAB-Bench.	verified	1.80	2026	Source ↗	Looks wrong?
17	ArabicNougat Arabic OCR - Character Error Rate (lower is better). ArabicNougat specialized Arabic document model on KITAB-Bench.	verified	4.37	2026	Source ↗	Looks wrong?
18	Surya Arabic OCR - Character Error Rate (lower is better). Surya OCR on KITAB-Bench.	verified	4.95	2026	Source ↗	Looks wrong?

Lineage

KITAB-Bench in context.

See full ocr benchmarks lineage →

Predecessors (1)

active2024-12

OCRBench v2

Arabic-script-specific fork — same VLM-era 'comprehensive OCR' framing, but on a script English-centric benchmarks under-cover. Frontier models still post CER >0.13.

This benchmark (1)

active2025-02

KITAB-Bench

None yet — this is the current frontier.

§ 04 · Submit a result

Add to the leaderboard.

← Back to Document OCR