AI Image Generation Arena:
Text-to-Image Rankings
Elo ratings computed from 4.3 million pairwise human preference votes across 51 text-to-image models. Users were shown two images for the same prompt and asked which they preferred — no labels, no bias. Updated 2025.
Google leads overall
5 of the top 15 slots are Google models — including #1, #3, #4, #12, and #15.
Flux 2 dominates mid-tier
Black Forest Labs places 4 Flux 2 variants (ranks 8–11 and 14) in the top 15.
Apache 2.0 options exist
Qwen-Image-2512 (#18, Elo 1136) and Z-Image-Turbo (#30) are fully open licensed.
Full Leaderboard — Top 30
Showing top 30 of 51 models. Elo scores are based on the standard 400-point scale. CI = 95% confidence interval on the Elo estimate.
| Rank | Model | Vendor | Elo | ±CI | Votes | License |
|---|---|---|---|---|---|---|
| 🥇 | gemini-3.1-flash-image-preview | 1266 | ±7 | 15K | Proprietary | |
| 🥈 | gpt-image-1.5-high-fidelity | OpenAI | 1244 | ±4 | 63K | Proprietary |
| 🥉 | gemini-3-pro-image-preview-2k | 1235 | ±5 | 58K | Proprietary | |
| 4 | gemini-3-pro-image-preview | 1232 | ±5 | 83K | Proprietary | |
| 5 | mai-image-2 | Microsoft AI | 1189 | ±8 | 6K | Proprietary |
| 6 | reve-v1.5 | Reve | 1177 | ±6 | 8K | Proprietary |
| 7 | grok-imagine-image | xAI | 1173 | ±4 | 49K | Proprietary |
| 8 | flux-2-max | Black Forest Labs | 1167 | ±4 | 66K | Proprietary |
| 9 | grok-imagine-image-pro | xAI | 1160 | ±4 | 48K | Proprietary |
| 10 | flux-2-flex | Black Forest Labs | 1158 | ±4 | 102K | Proprietary |
| 11 | flux-2-pro | Black Forest Labs | 1157 | ±4 | 97K | Proprietary |
| 12 | gemini-2.5-flash-image | 1154 | ±3 | 696K | Proprietary | |
| 13 | hunyuan-image-3.0 | Tencent | 1151 | ±3 | 173K | Community |
| 14 | flux-2-dev | Black Forest Labs | 1150 | ±5 | 50K | Proprietary |
| 15 | imagen-ultra-4.0 | 1147 | ±4 | 390K | Proprietary | |
| 16 | seedream-4.5 | Bytedance | 1145 | ±4 | 102K | Proprietary |
| 17 | seedream-4-2k | Bytedance | 1141 | ±6 | 13K | Proprietary |
| 18 | qwen-image-2512 | Alibaba | 1136 | ±4 | 48K | Apache 2.0 |
| 19 | wan2.6-t2i | Alibaba | 1135 | ±4 | 43K | Proprietary |
| 20 | imagen-4.0 | 1133 | ±3 | 462K | Proprietary | |
| 21 | seedream-4-fal | Bytedance | 1117 | ±6 | 12K | Proprietary |
| 22 | wan2.5-t2i-preview | Alibaba | 1115 | ±4 | 138K | Proprietary |
| 23 | gpt-image-1 | OpenAI | 1115 | ±3 | 266K | Proprietary |
| 24 | seedream-4-high-res | Bytedance | 1114 | ±4 | 117K | Proprietary |
| 25 | seedream-5.0-lite | Bytedance | 1113 | ±5 | 21K | Proprietary |
| 26 | gpt-image-1-mini | OpenAI | 1103 | ±4 | 106K | Proprietary |
| 27 | recraft-v4 | Recraft | 1102 | ±7 | 14K | Proprietary |
| 28 | mai-image-1 | Microsoft AI | 1093 | ±4 | 94K | Proprietary |
| 29 | seedream-3 | Bytedance | 1082 | ±5 | 37K | Proprietary |
| 30 | z-image-turbo | Alibaba | 1076 | ±6 | 12K | Apache 2.0 |
Elo Score Distribution — Top 30
Google vs Everyone
Google's multimodal image generation capabilities are unmatched at the top of the leaderboard. Gemini 3.1 Flash Image Preview achieves an Elo of 1266 — a full 22 points ahead of OpenAI's GPT-Image-1.5 in second place.
What makes this more striking is range: Google holds 5 distinct leaderboard positions in the top 20, spanning Gemini 3.1 Flash (#1), Gemini 3 Pro (#3, #4), Gemini 2.5 Flash (#12), and Imagen Ultra 4.0 (#15). No other vendor comes close to this breadth.
Google models in top 20
Black Forest Labs: Flux 2 Sweep
Black Forest Labs launched the original Flux in 2024 and immediately disrupted the open-weight image generation market. Flux 2 consolidates that lead: four variants cluster tightly between Elo 1150–1167, ranking 8th through 14th.
The tight clustering (17-point spread across 5 models) suggests BFL has found a capability ceiling with the current architecture. Flux 2 Max (Elo 1167) edges ahead on quality at the cost of inference speed, while Flux 2 Dev (1150) provides open-weights access for local deployment.
Flux 2 variants
Bytedance Seedream: The Rising Force
Bytedance's Seedream family has quietly become one of the most-tested model families in the arena. Six variants appear in the top 30, accumulating over 300K combined votes — a sign of significant deployment and user interest.
Seedream 4.5 (Elo 1145, rank 16) leads the family, sitting just above Imagen 4.0 and outpacing all OpenAI standard-tier models. The progression from Seedream 3 (1082) to Seedream 4.5 (1145) represents a 63-point Elo gain in a single generation cycle — rapid improvement by any measure.
Seedream family progress
Open-Source & Permissive Options
For teams that cannot use proprietary APIs — due to data privacy, cost, or licensing — two models stand out with permissive Apache 2.0 licenses:
Alibaba · Rank #18 · Apache 2.0 · 48K votes
The highest-ranked Apache 2.0 model. Commercially usable with no restrictions. Competitive with mid-tier proprietary models, outperforming GPT-Image-1 standard tier.
Alibaba · Rank #30 · Apache 2.0 · 12K votes
A turbo-class permissive model. Lower vote count suggests it is newer; headroom for further Elo movement as more comparisons accumulate.
Tencent · Rank #13 · Community · 173K votes
The community-licensed alternative with the most votes of any non-proprietary model. Strong photorealism, widely used in Chinese-market deployments.
Methodology
Blind Pairwise Voting
Users are shown two images generated from the same prompt by two different models, with no labels. They select which image they prefer. This blind comparison eliminates brand bias.
Elo Rating System
Elo ratings update after each vote based on expected vs actual outcome. The system is the same algorithm used in chess, adapted for multi-model comparison. Starting Elo is 1000 for all models.
Confidence Intervals
The ±CI values are 95% bootstrap confidence intervals. Models with fewer votes (e.g. mai-image-2 at 6K) have wider intervals (±8) than heavily-voted models like gemini-2.5-flash-image (696K votes, ±3).
Prompt Coverage
Prompts span photography, illustration, concept art, product design, portraits, landscapes, and abstract art. The distribution is crowd-sourced from real users rather than curated, reflecting actual use cases.
What Elo Measures
Elo captures human preference — not technical quality metrics like FID or IS. A model may have lower FID but higher Elo if humans simply prefer its aesthetic output. Both matter for different use cases.
Data Source
Leaderboard data is sourced from the LMSYS Chatbot Arena / Imgen Arena project, which runs continuous crowd-sourced evaluations. Scores represent a snapshot with 4.3M total votes across 51 models.
Frequently Asked Questions
What is the best AI image generator in 2025?
Based on 4.3M human preference votes, Gemini 3.1 Flash Image Preview (Elo 1266) leads the text-to-image arena, followed by GPT-Image-1.5 High Fidelity (1244) and Gemini 3 Pro Image Preview (1235). For open-source options, Qwen-Image-2512 (1136, Apache 2.0) is the top freely licensed model.
How does the text-to-image arena work?
The arena shows users two generated images from different models for the same prompt, and asks which they prefer. Elo ratings are computed from these head-to-head pairwise comparisons — the same system used in chess rankings. A higher Elo means humans consistently prefer that model's images.
Is Flux 2 better than Stable Diffusion?
Yes. Flux 2 from Black Forest Labs substantially outperforms legacy Stable Diffusion. Flux 2 Max (Elo 1167), Flux 2 Flex (1158), Flux 2 Pro (1157), and Flux 2 Dev (1150) all rank in the top 15 globally, making BFL the top open-weight image generation lab by breadth of strong models.
What is the best open-source text-to-image model?
Qwen-Image-2512 by Alibaba (Elo 1136) is the top Apache 2.0 licensed text-to-image model, ranking 18th overall. Z-Image-Turbo (Elo 1076, also Apache 2.0) is another permissively licensed option. Flux 2 models are also available as open weights though under a non-commercial license.
How does Google's Imagen compare to OpenAI's GPT-Image?
Google dominates the top of the leaderboard. Gemini 3.1 Flash Image Preview (Elo 1266) and Gemini 3 Pro variants (1235, 1232) rank 1st, 3rd, and 4th. OpenAI's GPT-Image-1.5 High Fidelity (1244) takes 2nd, and Imagen Ultra 4.0 (1147) and Imagen 4.0 (1133) round out Google's strong showing across all tiers.
What is Seedream by Bytedance?
Seedream is Bytedance's family of text-to-image models. Seedream 4.5 (Elo 1145) leads the family at rank 16, with multiple variants (4-2k, 4-fal, 4-high-res, 5.0-lite, and Seedream 3) all placing in the top 30. This makes Bytedance one of the most prolific image generation labs in the arena.
Vendor Summary
6 models in top 30
Top Elo: 1266
Bytedance
6 models in top 30
Top Elo: 1145
Black Forest Labs
4 models in top 30
Top Elo: 1167
Alibaba
3 models in top 30
Top Elo: 1136
OpenAI
3 models in top 30
Top Elo: 1244
xAI
2 models in top 30
Top Elo: 1173
Microsoft AI
2 models in top 30
Top Elo: 1189
Reve
1 model in top 30
Top Elo: 1177
Tencent
1 model in top 30
Top Elo: 1151
Recraft
1 model in top 30
Top Elo: 1102