AI Video Generation Arena:
Text-to-Video Rankings
Human-preference Elo rankings across 37 text-to-video models, computed from 246,000+ pairwise votes. Veo 3.1 with native audio generation dominates. Sora 2 Pro holds 4th. Open-source models (Kandinsky MIT, Wan Apache 2.0) punch above their weight.
Audio Generation Changes Everything
The gap between Veo 3.1 with audio (Elo 1381) and Veo 3 without audio (Elo 1257) is 124 Elo points — a chasm. All four top-10 Google models include native audio synthesis. Sora 2 Pro (4th, Elo 1367) is the only non-audio model in the top 6, and even it falls below Veo 3.1 audio variants. Audio is now the primary differentiator at the frontier — not motion quality or resolution.
Sora 2 Analysis
Sora 2 Pro (Elo 1367) ranks 4th globally with 18,963 votes — strong statistical confidence. Sora 2 standard (1342, 9th) trails by 25 Elo. OpenAI's video quality is genuine, but without audio generation, both Sora models sit behind five Google variants. The path to #1 runs through native audio.
Open-Source Highlights
Full Leaderboard
37 models · 246,000+ human votes · Elo-ranked
| # | Model | Vendor | Elo Score | 95% CI | Votes | License |
|---|---|---|---|---|---|---|
| 1 | veo-3.1-audio-1080p+ Audio | 1381 | ±16 | 5.5K | Proprietary | |
| 2 | veo-3.1-fast-audio-1080p+ Audio | 1378 | ±14 | 5.7K | Proprietary | |
| 3 | veo-3.1-audio+ Audio | 1371 | ±14 | 12.6K | Proprietary | |
| 4 | sora-2-pro | OpenAI | 1367 | ±9 | 19.0K | Proprietary |
| 5 | veo-3.1-fast-audio+ Audio | 1366 | ±11 | 25.4K | Proprietary | |
| 6 | grok-imagine-video-720p | xAI | 1358 | ±9 | 33.7K | Proprietary |
| 7 | veo-3-fast-audio+ Audio | 1351 | ±11 | 25.8K | Proprietary | |
| 8 | wan2.6-t2v | Alibaba | 1347 | ±17 | 6.4K | Proprietary |
| 9 | sora-2 | OpenAI | 1342 | ±8 | 25.2K | Proprietary |
| 10 | veo-3-audio+ Audio | 1341 | ±12 | 19.3K | Proprietary | |
| 11 | wan2.5-t2v-preview | Alibaba | 1268 | ±17 | 6.1K | Proprietary |
| 12 | veo-3 | 1257 | ±11 | 15.2K | Proprietary | |
| 13 | seedance-v1.5-pro | Bytedance | 1255 | ±8 | 31.6K | Proprietary |
| 14 | veo-3-fast | 1251 | ±12 | 15.5K | Proprietary | |
| 15 | pixverse-v5.6 | Pixverse | 1228 | ±14 | 2.3K | Proprietary |
| 16 | kling-2.5-turbo-1080p | KlingAI | 1221 | ±17 | 2.1K | Proprietary |
| 17 | kling-2.6-pro | KlingAI | 1219 | ±8 | 38.7K | Proprietary |
| 18 | runway-gen-4.5 | Runway | 1214 | ±11 | 3.9K | Proprietary |
| 19 | kling-o1-pro | KlingAI | 1208 | ±27 | 1.2K | Proprietary |
| 20 | ray-3 | Luma AI | 1204 | ±23 | 1.1K | Proprietary |
| 21 | hailuo-02-pro | MiniMax | 1200 | ±12 | 9.9K | Proprietary |
| 22 | hailuo-2.3 | MiniMax | 1196 | ±8 | 26.8K | Proprietary |
| 23 | seedance-v1-pro | Bytedance | 1192 | ±11 | 12.9K | Proprietary |
| 24 | hailuo-02-standard | MiniMax | 1181 | ±12 | 9.9K | Proprietary |
| 25 | p-video | Pruna | 1180 | ±15 | 3.6K | Proprietary |
| 26 | kandinsky-5.0-t2v-proOSS | Kandinsky | 1179 | ±21 | 1.9K | MIT |
| 27 | hunyuan-video-1.5OSS | Tencent | 1171 | ±16 | 4.1K | Community |
| 28 | kling-v2.1-master | KlingAI | 1168 | ±9 | 14.5K | Proprietary |
| 29 | veo-2 | 1166 | ±16 | 7.1K | Proprietary | |
| 30 | wan-v2.2-a14bOSS | Alibaba | 1130 | ±15 | 11.2K | Apache 2.0 |
Vendor Breakdown
Which companies dominate text-to-video AI in 2026?
Dominant across the board. Audio-enabled Veo 3.1 variants occupy ranks 1–3 and 5. The only company with native audio-video synthesis at the frontier.
Sora 2 Pro holds a strong 4th. High vote count (18K+) gives statistical confidence. Lacks audio generation — which is the main gap vs. Veo.
Surprising 6th-place finish with 33K+ votes — the most votes among top-10 models. Solid video quality from a company new to the space.
Three models in the leaderboard including one Apache 2.0 open-weight release (Wan v2.2). Strong contender in the mid-to-upper tier.
Most models in the leaderboard from any single vendor (4). Kling 2.6 Pro has the highest vote count overall at 38K+. Consistent mid-tier performer.
Hailuo series (three variants) clusters around Elo 1181–1200. High vote counts signal genuine user engagement. Chinese model to watch.
Frequently Asked Questions
What is the best AI video generator in 2026?+
How does Sora 2 compare to Veo 3?+
Are there open-source text-to-video models worth using?+
What is an Elo rating in AI video generation?+
Which company leads AI video generation?+
Does audio generation matter for video AI rankings?+
Methodology
Rankings are derived from pairwise human preference votes collected in an arena format. Participants are shown two videos generated from the same prompt by two randomly selected models and vote for the one they prefer. The Elo rating system (adapted from competitive chess) computes skill ratings from win/loss outcomes. A model's Elo increases when it defeats higher-rated opponents and decreases when it loses to lower-rated ones.
The 95% confidence interval (±CI) reflects the uncertainty in the Elo estimate given the available vote count. Models with fewer than ~2,000 votes carry wider confidence intervals and should be interpreted with caution. All 37 models in this leaderboard had at least 1,000 votes as of March 2026.
Data source: VideoGen-Eval arena · 246,000+ total votes · 37 models evaluated · Last updated March 2026. CodeSOTA republishes rankings with editorial context; we do not modify the underlying Elo scores.