ImageNet (ILSVRC2012) is a large-scale image classification dataset organized according to the WordNet hierarchy. The standard ImageNet-1k (ILSVRC) split contains 1,000 classes with roughly 1.2–1.3M training images and 50K validation images. In image generation literature the term “ImageNet 1024x1024” typically denotes the ImageNet-1k images resized (or center-cropped/resampled) to 1024×1024 resolution for high-resolution synthesis and evaluation. Common evaluation practice (used by many generative-model papers) is to generate 50,000 images (50 samples per each of the 1000 classes) and compute FID against the ImageNet training set at the target resolution.
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.