Codesota · Multimodal · Text-to-Image Generation · GenEvalTasks/Multimodal/Text-to-Image Generation
Text-to-Image Generation · benchmark dataset · 2023 · EN

GenEval.

Evaluates compositional text-to-image generation with object-level criteria

Submit a result
§ 01 · Leaderboard

Best published scores.

8 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.


Primary
accuracy · higher is better
geneval-score
8 rows
#ModelOrgSubmittedPaper / codegeneval-score
01Lumina-DiMOO w/ Self-GRPOOct 2025Lumina-DiMOO: An Omni Diffusion Large Language Model for… · code0.910
02BLIP3o-NEXT-GRPO-GenEval (3B)Oct 2025BLIP3o-NEXT: Next Frontier of Native Image Generation · code0.910
03SenseNova-U1-A3B-MoTSenseTimeMay 2026SenseNova-U1: Unifying Multimodal Understanding and Gene… · code0.910
04BAGEL (7B MoT) with LLM rewriterMay 2025Emerging Properties in Unified Multimodal Pretraining · code0.880
05Emu3.5 (34B, AR)Oct 2025Emu3.5: Native Multimodal Models are World Learners · code0.860
06BLIP3-o (8B)May 2025BLIP3-o: A Family of Fully Open Unified Multimodal Model… · code0.840
07AsymFLUX.2 kleinMay 2026Asymmetric Flow Models · code0.820
08Spectral Progressive Diffusion (PixelGen, TF)May 2026Spectral Progressive Diffusion for Efficient Image and V…0.782
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 04 · Literature

8 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies
GenEval — Text-to-Image Generation | CodeSOTA