AI Arena
Rankings
Which AI is actually the best? Not benchmarks — real human preferences. Millions of blind comparisons across text, code, vision, search, image generation, and video.
Data sourced from arena.ai (formerly LMSYS Chatbot Arena). Updated March 2026.
Text
General conversational AI. Which model gives the best text responses?
Anthropic leads with thinking variants at the top
Code
Code generation, debugging, and explanation. The coding assistant benchmark.
Claude holds top 5 positions. GLM-5 (MIT) is the best open-source.
Vision
Image understanding and multimodal reasoning over visual inputs.
Google dominates — 4 of top 6 are Gemini models.
Document
PDF, spreadsheet, and document understanding. Long-context analysis.
Context window size correlates strongly with ranking.
Search
AI-powered web search. Grounding, citations, and real-time information.
Grok-fast offers best value at $0.20/$0.50 per million tokens.
Text-to-Image
Image generation from text prompts. Quality, style, and prompt adherence.
Google and OpenAI lead. Flux and Seedream are strong challengers.
Text-to-Video
Video generation from text descriptions. Motion, coherence, and audio.
Audio generation is the new frontier — Veo 3.1 with audio dominates.
How Arena Rankings Work
Blind Comparison
Users submit a prompt. Two anonymous models respond. The user picks the better response without knowing which model is which.
Bradley-Terry Model
Votes are aggregated using the Bradley-Terry model (similar to Elo in chess). Each comparison updates both models' scores based on expected vs actual outcomes.
Statistical Significance
Each score includes a 95% confidence interval. Models with overlapping CIs are statistically tied. More votes = tighter intervals.
Who Leads Where?
| Category | Leader | Provider | Score | Best Value |
|---|---|---|---|---|
| Text | Claude Opus 4.6 | Anthropic | 1502 | Grok-4.1 ($0.20/$0.50) |
| Code | Claude Opus 4.6 | Anthropic | 1548 | GLM-5 ($1/$3.20, MIT) |
| Vision | Gemini 3 Pro | 1290 | Gemini 3 Flash ($0.50/$3) | |
| Document | Claude Opus 4.6 | Anthropic | 1524 | Claude Haiku 4.5 ($1/$5) |
| Search | Claude Opus 4.6 Search | Anthropic | 1255 | Grok-4-fast ($0.20/$0.50) |
| Text-to-Image | Gemini 3.1 Flash Image | 1266 | qwen-image (Apache 2.0) | |
| Text-to-Video | Veo 3.1 Audio 1080p | 1381 | Kandinsky 5.0 (MIT) |
Anthropic leads Text, Code, Document, and Search. Google leads Vision, Image Generation, and Video Generation.
Data sourced from arena.ai (formerly LMSYS Chatbot Arena). Rankings are based on millions of blind human preference votes.
Last updated: March 2026. Scores and rankings change as new votes come in.