YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark.

YouTube-VOS (YouTube Video Object Segmentation) is the first large-scale benchmark for video object segmentation, released by Ning Xu et al. in 2018. It targets semi-supervised video object segmentation (given the first-frame mask, segment the same object(s) in all frames) and provides a much larger and more diverse training set than earlier VOS datasets (e.g., DAVIS). The benchmark contains several thousand high-resolution YouTube clips with dense, high-quality pixel-level annotations sampled at 6 fps (annotations for every 5th frame). Commonly-reported statistics for the 2018 release: ~4,453 videos (split into train/val/test sets of 3,471 / 474 / 508), >7,800 unique object instances, and ~190k manual annotations. The dataset has been used for multiple VOS tasks (semi-supervised VOS, video instance segmentation, referring VOS) and supports evaluation protocols with unseen-category validation/test splits to test generalization.

Paper ↗Submit a result ↵

§ 01 · Leaderboard

Best published scores.

No results indexed yet — be the first to submit a score.

No benchmark results indexed yet

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark.

Best published scores.

Neighbouring benchmarks.

Have a score that beatsthis table?

Have a score that beats
this table?