Image Feature Extraction
Image feature extraction produces dense vector representations that encode visual semantics — the hidden layer outputs that power retrieval, clustering, similarity search, and transfer learning. The field progressed from hand-crafted descriptors (SIFT, SURF) to CNN features (ResNet, EfficientNet) to self-supervised vision transformers like DINOv2 (2023), which produces features so rich they rival task-specific models on segmentation, depth, and classification without any fine-tuning. DINOv2's success proved that visual foundation models can match the "extract and use everywhere" paradigm that BERT established in NLP. The quality of your feature extractor determines the ceiling for virtually every downstream vision task.
ImageNet kNN
Self-supervised / feature-extraction evaluation: frozen features + kNN classifier on ImageNet-1k. Standard in DINO, DINOv2, iBOT.
Top 10
Leading models on ImageNet kNN.
All datasets
1 dataset tracked for this task.
Related tasks
Other tasks in Computer Vision.
Looking to run a model? HuggingFace hosts inference for this task type.