YouTube-VOS (YouTube Video Object Segmentation) is the first large-scale benchmark for video object segmentation, released by Ning Xu et al. in 2018. It targets semi-supervised video object segmentation (given the first-frame mask, segment the same object(s) in all frames) and provides a much larger and more diverse training set than earlier VOS datasets (e.g., DAVIS). The benchmark contains several thousand high-resolution YouTube clips with dense, high-quality pixel-level annotations sampled at 6 fps (annotations for every 5th frame). Commonly-reported statistics for the 2018 release: ~4,453 videos (split into train/val/test sets of 3,471 / 474 / 508), >7,800 unique object instances, and ~190k manual annotations. The dataset has been used for multiple VOS tasks (semi-supervised VOS, video instance segmentation, referring VOS) and supports evaluation protocols with unseen-category validation/test splits to test generalization.
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.