Vision & Documents

Images, video frames, OCR, layout, tables, document parsing, detection, segmentation, and visual anomaly detection.

16 tasks25 datasets

Explore All Results

Tasks in Vision & Documents

Image Classification

Categorizing images into predefined classes.

5 datasets

View →

Object Detection

Locating and classifying objects in images.

2 datasets

View →

Semantic Segmentation

Pixel-level image classification.

2 datasets

View →

Instance Segmentation

Separating object instances at pixel level.

0 datasets

View →

Depth Estimation

Estimating scene depth from visual input.

0 datasets

View →

Pose / Keypoint Detection

Detecting body, object, or hand keypoints.

0 datasets

View →

Video Understanding

Understanding actions, events, and semantics in video.

0 datasets

View →

Document OCR

Converting scanned documents and images into machine-readable text.

6 datasets

View →

Scene Text Detection

Detecting text regions in natural scene images.

5 datasets

View →

Scene Text Recognition

Recognizing text content in natural scene images.

0 datasets

View →

Handwriting Recognition

Recognizing handwritten text from images.

2 datasets

View →

Document Layout Analysis

Identifying document blocks, tables, figures, and reading order.

0 datasets

View →

Document Parsing

Converting PDFs and document images into structured formats.

3 datasets

View →

Table Recognition

Extracting table structure and cells from documents.

0 datasets

View →

Document Question Answering

Answering questions over visual documents.

0 datasets

View →

Visual Anomaly Detection

Finding visual defects and outliers in images.

0 datasets

View →

Explore Other Areas

Language & Knowledge

Language understanding, retrieval, QA, RAG, factuality, information extraction, multilingual evaluation, and knowledge-heavy reasoning.

Audio & Speech

ASR, TTS, speaker intelligence, music, sound events, audio-language understanding, and audio safety.

Multimodal Media

Cross-modal image, text, audio, video, and 3D tasks where input and output span multiple media types.

Code & Software Engineering

Code generation, completion, repair, repository understanding, tests, vulnerability work, UI code, and mobile app code generation.