Live Rankings

AI Arena
Rankings

Which AI is actually the best? Not benchmarks — real human preferences. Millions of blind comparisons across text, code, vision, search, image generation, and video.

Data sourced from arena.ai (formerly LMSYS Chatbot Arena). Updated March 2026.

How Arena Rankings Work

1

Blind Comparison

Users submit a prompt. Two anonymous models respond. The user picks the better response without knowing which model is which.

2

Bradley-Terry Model

Votes are aggregated using the Bradley-Terry model (similar to Elo in chess). Each comparison updates both models' scores based on expected vs actual outcomes.

3

Statistical Significance

Each score includes a 95% confidence interval. Models with overlapping CIs are statistically tied. More votes = tighter intervals.

Who Leads Where?

CategoryLeaderProviderScoreBest Value
TextClaude Opus 4.6Anthropic1502Grok-4.1 ($0.20/$0.50)
CodeClaude Opus 4.6Anthropic1548GLM-5 ($1/$3.20, MIT)
VisionGemini 3 ProGoogle1290Gemini 3 Flash ($0.50/$3)
DocumentClaude Opus 4.6Anthropic1524Claude Haiku 4.5 ($1/$5)
SearchClaude Opus 4.6 SearchAnthropic1255Grok-4-fast ($0.20/$0.50)
Text-to-ImageGemini 3.1 Flash ImageGoogle1266qwen-image (Apache 2.0)
Text-to-VideoVeo 3.1 Audio 1080pGoogle1381Kandinsky 5.0 (MIT)

Anthropic leads Text, Code, Document, and Search. Google leads Vision, Image Generation, and Video Generation.

Data sourced from arena.ai (formerly LMSYS Chatbot Arena). Rankings are based on millions of blind human preference votes.

Last updated: March 2026. Scores and rankings change as new votes come in.