Arena · Text-to-Image4.3M human votes · 51 models

AI Image Generation Arena: Text-to-Image Rankings

Elo ratings computed from 4.3 million pairwise human preference votes across 51 text-to-image models. Users were shown two images for the same prompt and asked which they preferred — no labels, no bias. Updated 2025.

Total votes4.3M
Models evaluated51
Top Elo score1266
MethodologyElo rating
🏆

Google leads overall

5 of the top 15 slots are Google models — including #1, #3, #4, #12, and #15.

Flux 2 dominates mid-tier

Black Forest Labs places 4 Flux 2 variants (ranks 8–11 and 14) in the top 15.

🌱

Apache 2.0 options exist

Qwen-Image-2512 (#18, Elo 1136) and Z-Image-Turbo (#30) are fully open licensed.

Full Leaderboard — Top 30

Showing top 30 of 51 models. Elo scores are based on the standard 400-point scale. CI = 95% confidence interval on the Elo estimate.

🥇gemini-3.1-flash-image-preview
1266
GoogleProprietary±7 CI15K votes
🥈gpt-image-1.5-high-fidelity
1244
OpenAIProprietary±4 CI63K votes
🥉gemini-3-pro-image-preview-2k
1235
GoogleProprietary±5 CI58K votes
4gemini-3-pro-image-preview
1232
GoogleProprietary±5 CI83K votes
5mai-image-2
1189
Microsoft AIProprietary±8 CI6K votes
6reve-v1.5
1177
ReveProprietary±6 CI8K votes
7grok-imagine-image
1173
xAIProprietary±4 CI49K votes
8flux-2-max
1167
Black Forest LabsProprietary±4 CI66K votes
9grok-imagine-image-pro
1160
xAIProprietary±4 CI48K votes
10flux-2-flex
1158
Black Forest LabsProprietary±4 CI102K votes
11flux-2-pro
1157
Black Forest LabsProprietary±4 CI97K votes
12gemini-2.5-flash-image
1154
GoogleProprietary±3 CI696K votes
13hunyuan-image-3.0
1151
TencentCommunity±3 CI173K votes
14flux-2-dev
1150
Black Forest LabsProprietary±5 CI50K votes
15imagen-ultra-4.0
1147
GoogleProprietary±4 CI390K votes
16seedream-4.5
1145
BytedanceProprietary±4 CI102K votes
17seedream-4-2k
1141
BytedanceProprietary±6 CI13K votes
18qwen-image-2512
1136
AlibabaApache 2.0±4 CI48K votes
19wan2.6-t2i
1135
AlibabaProprietary±4 CI43K votes
20imagen-4.0
1133
GoogleProprietary±3 CI462K votes
21seedream-4-fal
1117
BytedanceProprietary±6 CI12K votes
22wan2.5-t2i-preview
1115
AlibabaProprietary±4 CI138K votes
23gpt-image-1
1115
OpenAIProprietary±3 CI266K votes
24seedream-4-high-res
1114
BytedanceProprietary±4 CI117K votes
25seedream-5.0-lite
1113
BytedanceProprietary±5 CI21K votes
26gpt-image-1-mini
1103
OpenAIProprietary±4 CI106K votes
27recraft-v4
1102
RecraftProprietary±7 CI14K votes
28mai-image-1
1093
Microsoft AIProprietary±4 CI94K votes
29seedream-3
1082
BytedanceProprietary±5 CI37K votes
30z-image-turbo
1076
AlibabaApache 2.0±6 CI12K votes

Elo Score Distribution — Top 30

1gemini-3.1-flash-image-preview
1266
2gpt-image-1.5-high-fidelity
1244
3gemini-3-pro-image-preview-2k
1235
4gemini-3-pro-image-preview
1232
5mai-image-2
1189
6reve-v1.5
1177
7grok-imagine-image
1173
8flux-2-max
1167
9grok-imagine-image-pro
1160
10flux-2-flex
1158
11flux-2-pro
1157
12gemini-2.5-flash-image
1154
13hunyuan-image-3.0
1151
14flux-2-dev
1150
15imagen-ultra-4.0
1147
16seedream-4.5
1145
17seedream-4-2k
1141
18qwen-image-2512
1136
19wan2.6-t2i
1135
20imagen-4.0
1133
21seedream-4-fal
1117
22wan2.5-t2i-preview
1115
23gpt-image-1
1115
24seedream-4-high-res
1114
25seedream-5.0-lite
1113
26gpt-image-1-mini
1103
27recraft-v4
1102
28mai-image-1
1093
29seedream-3
1082
30z-image-turbo
1076

Google vs Everyone

Google's multimodal image generation capabilities are unmatched at the top of the leaderboard. Gemini 3.1 Flash Image Preview achieves an Elo of 1266 — a full 22 points ahead of OpenAI's GPT-Image-1.5 in second place.

What makes this more striking is range: Google holds 5 distinct leaderboard positions in the top 20, spanning Gemini 3.1 Flash (#1), Gemini 3 Pro (#3, #4), Gemini 2.5 Flash (#12), and Imagen Ultra 4.0 (#15). No other vendor comes close to this breadth.

Google models in top 20

gemini-3.1-flash-image-preview1266
gemini-3-pro-image-preview-2k1235
gemini-3-pro-image-preview1232
gemini-2.5-flash-image1154
imagen-ultra-4.01147
imagen-4.01133

Black Forest Labs: Flux 2 Sweep

Black Forest Labs launched the original Flux in 2024 and immediately disrupted the open-weight image generation market. Flux 2 consolidates that lead: four variants cluster tightly between Elo 1150–1167, ranking 8th through 14th.

The tight clustering (17-point spread across 5 models) suggests BFL has found a capability ceiling with the current architecture. Flux 2 Max (Elo 1167) edges ahead on quality at the cost of inference speed, while Flux 2 Dev (1150) provides open-weights access for local deployment.

Flux 2 variants

flux-2-max1167
flux-2-flex1158
flux-2-pro1157
flux-2-dev1150

Bytedance Seedream: The Rising Force

Bytedance's Seedream family has quietly become one of the most-tested model families in the arena. Six variants appear in the top 30, accumulating over 300K combined votes — a sign of significant deployment and user interest.

Seedream 4.5 (Elo 1145, rank 16) leads the family, sitting just above Imagen 4.0 and outpacing all OpenAI standard-tier models. The progression from Seedream 3 (1082) to Seedream 4.5 (1145) represents a 63-point Elo gain in a single generation cycle — rapid improvement by any measure.

Seedream family progress

seedream-4.5
102K votes1145
seedream-4-2k
13K votes1141
seedream-4-fal
12K votes1117
seedream-4-high-res
117K votes1114
seedream-5.0-lite
21K votes1113
seedream-3
37K votes1082

Open-Source & Permissive Options

For teams that cannot use proprietary APIs — due to data privacy, cost, or licensing — two models stand out with permissive Apache 2.0 licenses:

qwen-image-25121136

Alibaba · Rank #18 · Apache 2.0 · 48K votes

The highest-ranked Apache 2.0 model. Commercially usable with no restrictions. Competitive with mid-tier proprietary models, outperforming GPT-Image-1 standard tier.

z-image-turbo1076

Alibaba · Rank #30 · Apache 2.0 · 12K votes

A turbo-class permissive model. Lower vote count suggests it is newer; headroom for further Elo movement as more comparisons accumulate.

hunyuan-image-3.01151

Tencent · Rank #13 · Community · 173K votes

The community-licensed alternative with the most votes of any non-proprietary model. Strong photorealism, widely used in Chinese-market deployments.

Methodology

Blind Pairwise Voting

Users are shown two images generated from the same prompt by two different models, with no labels. They select which image they prefer. This blind comparison eliminates brand bias.

Elo Rating System

Elo ratings update after each vote based on expected vs actual outcome. The system is the same algorithm used in chess, adapted for multi-model comparison. Starting Elo is 1000 for all models.

Confidence Intervals

The ±CI values are 95% bootstrap confidence intervals. Models with fewer votes (e.g. mai-image-2 at 6K) have wider intervals (±8) than heavily-voted models like gemini-2.5-flash-image (696K votes, ±3).

Prompt Coverage

Prompts span photography, illustration, concept art, product design, portraits, landscapes, and abstract art. The distribution is crowd-sourced from real users rather than curated, reflecting actual use cases.

What Elo Measures

Elo captures human preference — not technical quality metrics like FID or IS. A model may have lower FID but higher Elo if humans simply prefer its aesthetic output. Both matter for different use cases.

Data Source

Leaderboard data is sourced from the LMSYS Chatbot Arena / Imgen Arena project, which runs continuous crowd-sourced evaluations. Scores represent a snapshot with 4.3M total votes across 51 models.

Frequently Asked Questions

What is the best AI image generator in 2025?

Based on 4.3M human preference votes, Gemini 3.1 Flash Image Preview (Elo 1266) leads the text-to-image arena, followed by GPT-Image-1.5 High Fidelity (1244) and Gemini 3 Pro Image Preview (1235). For open-source options, Qwen-Image-2512 (1136, Apache 2.0) is the top freely licensed model.

How does the text-to-image arena work?

The arena shows users two generated images from different models for the same prompt, and asks which they prefer. Elo ratings are computed from these head-to-head pairwise comparisons — the same system used in chess rankings. A higher Elo means humans consistently prefer that model's images.

Is Flux 2 better than Stable Diffusion?

Yes. Flux 2 from Black Forest Labs substantially outperforms legacy Stable Diffusion. Flux 2 Max (Elo 1167), Flux 2 Flex (1158), Flux 2 Pro (1157), and Flux 2 Dev (1150) all rank in the top 15 globally, making BFL the top open-weight image generation lab by breadth of strong models.

What is the best open-source text-to-image model?

Qwen-Image-2512 by Alibaba (Elo 1136) is the top Apache 2.0 licensed text-to-image model, ranking 18th overall. Z-Image-Turbo (Elo 1076, also Apache 2.0) is another permissively licensed option. Flux 2 models are also available as open weights though under a non-commercial license.

How does Google's Imagen compare to OpenAI's GPT-Image?

Google dominates the top of the leaderboard. Gemini 3.1 Flash Image Preview (Elo 1266) and Gemini 3 Pro variants (1235, 1232) rank 1st, 3rd, and 4th. OpenAI's GPT-Image-1.5 High Fidelity (1244) takes 2nd, and Imagen Ultra 4.0 (1147) and Imagen 4.0 (1133) round out Google's strong showing across all tiers.

What is Seedream by Bytedance?

Seedream is Bytedance's family of text-to-image models. Seedream 4.5 (Elo 1145) leads the family at rank 16, with multiple variants (4-2k, 4-fal, 4-high-res, 5.0-lite, and Seedream 3) all placing in the top 30. This makes Bytedance one of the most prolific image generation labs in the arena.

Vendor Summary

Google

6 models in top 30

Top Elo: 1266

Bytedance

6 models in top 30

Top Elo: 1145

Black Forest Labs

4 models in top 30

Top Elo: 1167

Alibaba

3 models in top 30

Top Elo: 1136

OpenAI

3 models in top 30

Top Elo: 1244

xAI

2 models in top 30

Top Elo: 1173

Microsoft AI

2 models in top 30

Top Elo: 1189

Reve

1 model in top 30

Top Elo: 1177

Tencent

1 model in top 30

Top Elo: 1151

Recraft

1 model in top 30

Top Elo: 1102