Codesota · General · Vision-Language Models · MEGA-Bench (macro)Tasks/General/Vision-Language Models
Vision-Language Models · benchmark dataset · EN

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks.

MEGA-Bench is a large-scale multimodal evaluation suite that consolidates over 500 real-world multimodal tasks into a unified evaluation format. Released by TIGER-Lab, MEGA-Bench provides curated high-quality data samples (images/videos + text) and standardized example/metric fields (e.g., task_name, task_description, example_text, example_media, metric_info, answer, eval_context) to enable cost-effective, accurate evaluation of multimodal/vision-language models. The Hugging Face dataset contains subsets (e.g., core and open), a test split (core ≈ 6.53k rows), and metadata describing each task and its evaluation metric. The accompanying paper (ICLR 2025 / arXiv:2410.10563) describes the benchmark and reports aggregated metrics including a macro metric across tasks. License: Apache-2.0. Main resources: paper (arXiv), code (GitHub), dataset and leaderboard on Hugging Face.

Paper Submit a result
§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet
§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies