Live ArenaUpdated March 2026

AI Video Generation Arena:
Text-to-Video Rankings

Human-preference Elo rankings across 37 text-to-video models, computed from 246,000+ pairwise votes. Veo 3.1 with native audio generation dominates. Sora 2 Pro holds 4th. Open-source models (Kandinsky MIT, Wan Apache 2.0) punch above their weight.

Models Ranked

246K+

Human Votes

1381

Top Elo Score

Open-Source Models

Audio Generation Changes Everything

The gap between Veo 3.1 with audio (Elo 1381) and Veo 3 without audio (Elo 1257) is 124 Elo points — a chasm. All four top-10 Google models include native audio synthesis. Sora 2 Pro (4th, Elo 1367) is the only non-audio model in the top 6, and even it falls below Veo 3.1 audio variants. Audio is now the primary differentiator at the frontier — not motion quality or resolution.

Sora 2 Analysis

Sora 2 Pro (Elo 1367) ranks 4th globally with 18,963 votes — strong statistical confidence. Sora 2 standard (1342, 9th) trails by 25 Elo. OpenAI's video quality is genuine, but without audio generation, both Sora models sit behind five Google variants. The path to #1 runs through native audio.

Sora 2 Pro1367

Sora 21342

vs Veo 3.1 Audio−14

Open-Source Highlights

Kandinsky 5.0 T2V Pro

MIT · Rank #26 · Elo 1179

MIT licensed. Comparable to Runway Gen 4.5 (1214) at no cost.

HunyuanVideo 1.5

Community · Rank #27 · Elo 1171

Tencent's community model. Strong motion quality for local inference.

Wan v2.2 A14B

Apache 2.0 · Rank #30 · Elo 1130

Apache 2.0 — commercial use free. Alibaba's open weight release.

Full Leaderboard

37 models · 246,000+ human votes · Elo-ranked

Native audioOpen license

#	Model	Vendor	Elo Score	95% CI	Votes	License
1	veo-3.1-audio-1080p+ Audio	Google	1381	±16	5.5K	Proprietary
2	veo-3.1-fast-audio-1080p+ Audio	Google	1378	±14	5.7K	Proprietary
3	veo-3.1-audio+ Audio	Google	1371	±14	12.6K	Proprietary
4	sora-2-pro	OpenAI	1367	±9	19.0K	Proprietary
5	veo-3.1-fast-audio+ Audio	Google	1366	±11	25.4K	Proprietary
6	grok-imagine-video-720p	xAI	1358	±9	33.7K	Proprietary
7	veo-3-fast-audio+ Audio	Google	1351	±11	25.8K	Proprietary
8	wan2.6-t2v	Alibaba	1347	±17	6.4K	Proprietary
9	sora-2	OpenAI	1342	±8	25.2K	Proprietary
10	veo-3-audio+ Audio	Google	1341	±12	19.3K	Proprietary
11	wan2.5-t2v-preview	Alibaba	1268	±17	6.1K	Proprietary
12	veo-3	Google	1257	±11	15.2K	Proprietary
13	seedance-v1.5-pro	Bytedance	1255	±8	31.6K	Proprietary
14	veo-3-fast	Google	1251	±12	15.5K	Proprietary
15	pixverse-v5.6	Pixverse	1228	±14	2.3K	Proprietary
16	kling-2.5-turbo-1080p	KlingAI	1221	±17	2.1K	Proprietary
17	kling-2.6-pro	KlingAI	1219	±8	38.7K	Proprietary
18	runway-gen-4.5	Runway	1214	±11	3.9K	Proprietary
19	kling-o1-pro	KlingAI	1208	±27	1.2K	Proprietary
20	ray-3	Luma AI	1204	±23	1.1K	Proprietary
21	hailuo-02-pro	MiniMax	1200	±12	9.9K	Proprietary
22	hailuo-2.3	MiniMax	1196	±8	26.8K	Proprietary
23	seedance-v1-pro	Bytedance	1192	±11	12.9K	Proprietary
24	hailuo-02-standard	MiniMax	1181	±12	9.9K	Proprietary
25	p-video	Pruna	1180	±15	3.6K	Proprietary
26	kandinsky-5.0-t2v-proOSS	Kandinsky	1179	±21	1.9K	MIT
27	hunyuan-video-1.5OSS	Tencent	1171	±16	4.1K	Community
28	kling-v2.1-master	KlingAI	1168	±9	14.5K	Proprietary
29	veo-2	Google	1166	±16	7.1K	Proprietary
30	wan-v2.2-a14bOSS	Alibaba	1130	±15	11.2K	Apache 2.0

1veo-3.1-audio-1080p

GoogleProprietary+ Audio

1381

±16 · 5.5K votes

2veo-3.1-fast-audio-1080p

GoogleProprietary+ Audio

1378

±14 · 5.7K votes

3veo-3.1-audio

GoogleProprietary+ Audio

1371

±14 · 12.6K votes

4sora-2-pro

OpenAIProprietary

1367

±9 · 19.0K votes

5veo-3.1-fast-audio

GoogleProprietary+ Audio

1366

±11 · 25.4K votes

6grok-imagine-video-720p

xAIProprietary

1358

±9 · 33.7K votes

7veo-3-fast-audio

GoogleProprietary+ Audio

1351

±11 · 25.8K votes

8wan2.6-t2v

AlibabaProprietary

1347

±17 · 6.4K votes

9sora-2

OpenAIProprietary

1342

±8 · 25.2K votes

10veo-3-audio

GoogleProprietary+ Audio

1341

±12 · 19.3K votes

11wan2.5-t2v-preview

AlibabaProprietary

1268

±17 · 6.1K votes

12veo-3

GoogleProprietary

1257

±11 · 15.2K votes

13seedance-v1.5-pro

BytedanceProprietary

1255

±8 · 31.6K votes

14veo-3-fast

GoogleProprietary

1251

±12 · 15.5K votes

15pixverse-v5.6

PixverseProprietary

1228

±14 · 2.3K votes

16kling-2.5-turbo-1080p

KlingAIProprietary

1221

±17 · 2.1K votes

17kling-2.6-pro

KlingAIProprietary

1219

±8 · 38.7K votes

18runway-gen-4.5

RunwayProprietary

1214

±11 · 3.9K votes

19kling-o1-pro

KlingAIProprietary

1208

±27 · 1.2K votes

20ray-3

Luma AIProprietary

1204

±23 · 1.1K votes

21hailuo-02-pro

MiniMaxProprietary

1200

±12 · 9.9K votes

22hailuo-2.3

MiniMaxProprietary

1196

±8 · 26.8K votes

23seedance-v1-pro

BytedanceProprietary

1192

±11 · 12.9K votes

24hailuo-02-standard

MiniMaxProprietary

1181

±12 · 9.9K votes

25p-video

PrunaProprietary

1180

±15 · 3.6K votes

26kandinsky-5.0-t2v-pro

KandinskyMIT

1179

±21 · 1.9K votes

27hunyuan-video-1.5

TencentCommunity

1171

±16 · 4.1K votes

28kling-v2.1-master

KlingAIProprietary

1168

±9 · 14.5K votes

29veo-2

GoogleProprietary

1166

±16 · 7.1K votes

30wan-v2.2-a14b

AlibabaApache 2.0

1130

±15 · 11.2K votes

Vendor Breakdown

Which companies dominate text-to-video AI in 2026?

Google8 models

veo-3.1-audio-1080p

Best Elo: 1381

Dominant across the board. Audio-enabled Veo 3.1 variants occupy ranks 1–3 and 5. The only company with native audio-video synthesis at the frontier.

OpenAI2 models

sora-2-pro

Best Elo: 1367

Sora 2 Pro holds a strong 4th. High vote count (18K+) gives statistical confidence. Lacks audio generation — which is the main gap vs. Veo.

xAI1 model

grok-imagine-video-720p

Best Elo: 1358

Surprising 6th-place finish with 33K+ votes — the most votes among top-10 models. Solid video quality from a company new to the space.

Alibaba3 models

wan2.6-t2v

Best Elo: 1347

Three models in the leaderboard including one Apache 2.0 open-weight release (Wan v2.2). Strong contender in the mid-to-upper tier.

KlingAI4 models

kling-2.5-turbo-1080p

Best Elo: 1221

Most models in the leaderboard from any single vendor (4). Kling 2.6 Pro has the highest vote count overall at 38K+. Consistent mid-tier performer.

MiniMax3 models

hailuo-02-pro

Best Elo: 1200

Hailuo series (three variants) clusters around Elo 1181–1200. High vote counts signal genuine user engagement. Chinese model to watch.

Frequently Asked Questions

What is the best AI video generator in 2026?+

Google Veo 3.1 with audio (Elo 1381) leads the text-to-video leaderboard in 2026, followed by Veo 3.1 Fast with audio (1378) and Sora 2 Pro (1367). Veo models with integrated audio generation dominate the top rankings by a significant margin over models that generate silent video.

How does Sora 2 compare to Veo 3?+

Sora 2 Pro (Elo 1367) ranks 4th overall, behind three Veo 3.1 variants that include native audio generation. Sora 2 (standard, Elo 1342) ranks 9th. Google's Veo 3 (without audio, Elo 1257) is actually ranked below Sora 2 Pro, showing that audio generation — not just video quality — is the key differentiator at the top.

Are there open-source text-to-video models worth using?+

Yes. Kandinsky 5.0 T2V Pro (Elo 1179) is MIT licensed and ranks 26th overall — competitive with paid APIs. Wan v2.2 A14B (Elo 1130) is Apache 2.0 licensed and ranks 30th. HunyuanVideo 1.5 (Elo 1171) from Tencent is a community-licensed model that performs well. These open models are surprisingly close to mid-tier commercial offerings.

What is an Elo rating in AI video generation?+

Elo scores in video generation arenas are calculated from pairwise human preference votes — viewers compare two generated videos side-by-side and vote for the better one. The Elo system (originally from chess) computes a skill rating from win/loss records. Higher Elo means humans consistently prefer that model's outputs. All ratings here are sourced from the VideoGen-Eval arena with 246,000+ votes across 37 models.

Which company leads AI video generation?+

Google leads AI video generation in 2026 with 8 models in the top 37, including the top 3 spots (Veo 3.1 audio variants). OpenAI's Sora 2 holds 4th place. Chinese companies — Alibaba (Wan), Bytedance (Seedance), MiniMax (Hailuo), KlingAI, and Tencent (HunyuanVideo) — occupy most of the mid-tier, showing strong competition from the Chinese AI ecosystem.

Does audio generation matter for video AI rankings?+

Dramatically yes. The gap between Veo 3.1 with audio (Elo 1381) and Veo 3 without audio (Elo 1257) is 124 Elo points — roughly the difference between a grandmaster and an average club player in chess. Models with native audio generation dominate the top 10, and the gap between audio-capable and silent video models is the largest capability split visible in the 2026 arena data.

Methodology

Rankings are derived from pairwise human preference votes collected in an arena format. Participants are shown two videos generated from the same prompt by two randomly selected models and vote for the one they prefer. The Elo rating system (adapted from competitive chess) computes skill ratings from win/loss outcomes. A model's Elo increases when it defeats higher-rated opponents and decreases when it loses to lower-rated ones.

The 95% confidence interval (±CI) reflects the uncertainty in the Elo estimate given the available vote count. Models with fewer than ~2,000 votes carry wider confidence intervals and should be interpreted with caution. All 37 models in this leaderboard had at least 1,000 votes as of March 2026.

Data source: VideoGen-Eval arena · 246,000+ total votes · 37 models evaluated · Last updated March 2026. CodeSOTA republishes rankings with editorial context; we do not modify the underlying Elo scores.