OCR benchmark
Higher is better
| # | Model | Score | Source |
|---|---|---|---|
| ★ | Ultravox-GLM-4P7 | 88.86 | codesota-api |
| 2 | Whisper-v3-large + GPT-4o (cascade) | 87.8 | codesota-api |
| 3 | GPT-4o-Audio | 86.75 | codesota-api |
| 4 | Whisper-v3-large + LLaMA-3.1-8B (cascade) | 77.48 | codesota-api |
| 5 | Kimi-Audio | 76.91 | codesota-api |
| 6 | MiniCPM-o | 71.23 | codesota-api |
| 7 | VITA-1.5 | 64.53 | codesota-api |
| 8 | Qwen2-Audio | 55.8 | codesota-api |
| 9 | LLaMA-Omni | 41.12 | codesota-api |
| 10 | VITA-1.0 | 36.43 | codesota-api |
| 11 | Mini-Omni2 | 33.49 | codesota-api |
| 12 | Mini-Omni | 30.42 | codesota-api |
| 13 | Moshi | 29.51 | codesota-api |