coMplex video Object SEgmentation (MOSE).

MOSE (coMplex video Object SEgmentation) is a video object segmentation (VOS) dataset introduced to study VOS under complex, realistic scenes where target objects are often small, inconspicuous, heavily occluded, disappear/reappear, or occur in crowded environments. MOSE contains 2,149 video clips with 5,200 target objects and 431,725 high-quality per-frame object segmentation masks (videos are typically 1920×1080 and 5–60 seconds long). The dataset was created to benchmark tracking-and-segmentation robustness in challenging scenarios; standard VOS metrics such as the J&F (region similarity J and contour accuracy F) are used for evaluation. The dataset and benchmark were published in the ICCV 2023 paper "MOSE: A New Dataset for Video Object Segmentation in Complex Scenes" (arXiv:2302.01872) and have an associated project/competition site (MOSE challenge / eval servers).

Paper ↗Submit a result ↵

§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

coMplex video Object SEgmentation (MOSE).

Best published scores.

Neighbouring benchmarks.

Have a score that beatsthis table?

Have a score that beats
this table?