Evaluates compositional text-to-image generation with object-level criteria
8 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.
| # | Model | Org | Submitted | Paper / code | geneval-score |
|---|---|---|---|---|---|
| 01 | Lumina-DiMOO w/ Self-GRPO | — | Oct 2025 | Lumina-DiMOO: An Omni Diffusion Large Language Model for… · code | 0.910 |
| 02 | BLIP3o-NEXT-GRPO-GenEval (3B) | — | Oct 2025 | BLIP3o-NEXT: Next Frontier of Native Image Generation · code | 0.910 |
| 03 | SenseNova-U1-A3B-MoT | SenseTime | May 2026 | SenseNova-U1: Unifying Multimodal Understanding and Gene… · code | 0.910 |
| 04 | BAGEL (7B MoT) with LLM rewriter | — | May 2025 | Emerging Properties in Unified Multimodal Pretraining · code | 0.880 |
| 05 | Emu3.5 (34B, AR) | — | Oct 2025 | Emu3.5: Native Multimodal Models are World Learners · code | 0.860 |
| 06 | BLIP3-o (8B) | — | May 2025 | BLIP3-o: A Family of Fully Open Unified Multimodal Model… · code | 0.840 |
| 07 | AsymFLUX.2 klein | — | May 2026 | Asymmetric Flow Models · code | 0.820 |
| 08 | Spectral Progressive Diffusion (PixelGen, TF) | — | May 2026 | Spectral Progressive Diffusion for Efficient Image and V… | 0.782 |
Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.