HuggingFace ↔ CodeSOTA

Every HuggingFace pipeline task mapped to CodeSOTA benchmarks. Find which benchmarks evaluate models for any HF task.

52HF tasks
52Mapped
52With benchmarks
94CodeSOTA-only

Multimodal

Computer Vision

Natural Language Processing

Audio

Tabular

Reinforcement Learning

HF Pipeline Tag
CodeSOTA Task
Area
Benchmarks
Results

Other

HF Pipeline Tag
CodeSOTA Task
Area
Benchmarks
Results

CodeSOTA-only tasks

Tasks tracked by CodeSOTA that don't have a direct HuggingFace pipeline equivalent.

Adversarial

Agentic AI

Agent Memory2 benchmarks
Autonomous Coding2 benchmarks23 results
Bioinformatics Agents1 benchmarks2 results
HCAST1 benchmarks6 results
RE-Bench1 benchmarks5 results
SWE-bench1 benchmarks81 results
Task agents7 benchmarks35 results
Time Horizon1 benchmarks5 results
Tool Use1 benchmarks8 results
Web & Desktop Agents2 benchmarks19 results

Audio

Audio Captioning1 benchmarks3 results
Music Generation1 benchmarks3 results
Sound Event Detection1 benchmarks3 results
Voice cloning1 benchmarks3 results

Computer Code

Bug Detection1 benchmarks6 results
Code Completion1 benchmarks6 results
Code Generation10 benchmarks196 results
Code Summarization1 benchmarks3 results
Code Translation1 benchmarks7 results
Program Repair1 benchmarks5 results

Computer Vision

3D Understanding4 benchmarks
Document Image Classification7 benchmarks62 results
Document Layout Analysis5 benchmarks133 results
Document Parsing3 benchmarks117 results
General OCR Capabilities4 benchmarks66 results
Handwriting Recognition6 benchmarks40 results
Image editing5 benchmarks
Image generation11 benchmarks
Image segmentation9 benchmarks3 results
OCR5 benchmarks1 results
Object counting1 benchmarks
Optical Character Recognition114 benchmarks829 results
Scene Text Detection11 benchmarks581 results
Scene Text Recognition11 benchmarks127 results
Table Recognition5 benchmarks71 results
Video segmentation3 benchmarks

General

Coding Agents7 benchmarks4 results
Computer Use Agents11 benchmarks
General1 benchmarks
Omni models2 benchmarks
Retrieval7 benchmarks
Video-Language Models19 benchmarks4 results

Graphs

Link Prediction1 benchmarks3 results
Molecular Property Prediction1 benchmarks3 results

Industrial Inspection

Anomaly Detection7 benchmarks27 results
Weld Inspection1 benchmarks

Knowledge Base

Entity Linking1 benchmarks3 results
Knowledge Graph Completion1 benchmarks3 results
Relation Extraction1 benchmarks3 results

Medical

Clinical NLP1 benchmarks
Disease Classification9 benchmarks57 results
Drug Discovery1 benchmarks
Medical Image Segmentation4 benchmarks26 results

Methodology

Mobile Development

React Native Code Generation1 benchmarks40 results

Natural Language Processing

Natural Language Inference1 benchmarks8 results
Polish Conversation Quality1 benchmarks450 results
Polish Cultural Competency1 benchmarks1155 results
Polish Emotional Intelligence1 benchmarks101 results
Polish LLM General1 benchmarks3728 results
Polish Text Understanding1 benchmarks465 results

Other

Other4 benchmarks

Reasoning

Arithmetic Reasoning2 benchmarks6 results
Commonsense Reasoning6 benchmarks82 results
Logical Reasoning4 benchmarks12 results
Mathematical Reasoning4 benchmarks79 results
Multi-step Reasoning4 benchmarks53 results

Reinforcement Learning

Continuous Control1 benchmarks9 results
Offline RL1 benchmarks

Robots

Robot Manipulation1 benchmarks
Robot Navigation1 benchmarks

Speech

Speaker Verification1 benchmarks3 results
Speech Translation1 benchmarks3 results

Time-series