MEGA-Bench is a large-scale multimodal evaluation suite that consolidates over 500 real-world multimodal tasks into a unified evaluation format. Released by TIGER-Lab, MEGA-Bench provides curated high-quality data samples (images/videos + text) and standardized example/metric fields (e.g., task_name, task_description, example_text, example_media, metric_info, answer, eval_context) to enable cost-effective, accurate evaluation of multimodal/vision-language models. The Hugging Face dataset contains subsets (e.g., core and open), a test split (core ≈ 6.53k rows), and metadata describing each task and its evaluation metric. The accompanying paper (ICLR 2025 / arXiv:2410.10563) describes the benchmark and reports aggregated metrics including a macro metric across tasks. License: Apache-2.0. Main resources: paper (arXiv), code (GitHub), dataset and leaderboard on Hugging Face.
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.