Computer Vision

Building systems that understand images and video? Find benchmarks for recognition, detection, segmentation, and document analysis tasks.

9 tasks29 datasets

Explore All Results

Tasks in Computer Vision

Scene Text Detection

Detecting text regions in natural scene images.

5 datasets

View →

Document OCR

Converting scanned documents and images into machine-readable text.

6 datasets

View →

Handwriting Recognition

Recognizing handwritten text from images.

3 datasets

View →

Document Parsing

Converting documents (like PDFs) into structured formats (Markdown/HTML).

2 datasets

View →

General OCR Capabilities

Comprehensive benchmarks covering multiple aspects of OCR performance.

4 datasets

View →

Document Layout Analysis

Analyzing the layout structure of documents to identify text blocks, figures, tables, and other elements.

0 datasets

View →

Image Classification

Categorizing images into predefined classes (ImageNet, CIFAR).

5 datasets

View →

Object Detection

Locating and classifying objects in images (COCO, Pascal VOC).

2 datasets

View →

Semantic Segmentation

Pixel-level classification of images (Cityscapes, ADE20K).

2 datasets

View →

Explore Other Areas

Natural Language Processing

Processing and understanding text? Evaluate your models on language understanding, generation, translation, and information extraction benchmarks.

Reasoning

Testing if your model can think logically? Benchmark math problem solving, commonsense understanding, and multi-step reasoning capabilities.

Computer Code

Developing AI coding assistants? Test code generation, completion, translation, bug detection, and repair capabilities.

Speech

Working with voice and audio? Evaluate speech-to-text accuracy, voice synthesis quality, and speaker identification performance.