Benchmark Ontology
Complete hierarchy of ML benchmarks. Navigate from research areas to specific datasets and compare model performance.
17
Areas
84
Tasks
227
Datasets
613
Models
1777
Results
302
Papers
Hierarchy Structure
Building systems that understand images and video? Find benchmarks for recognition, detection, segmentation, and document analysis tasks.
Testing if your model can think logically? Benchmark math problem solving, commonsense understanding, and multi-step reasoning capabilities.
Building healthcare AI? Find benchmarks for medical imaging, disease diagnosis, clinical text processing, and drug discovery.
Building quality control systems? Benchmark anomaly detection, defect classification, and automated visual inspection for manufacturing.
Developing AI coding assistants? Test code generation, completion, translation, bug detection, and repair capabilities.
Training agents to make decisions? Benchmark your policies on game playing, continuous control, and offline learning tasks.
Working with network data? Test graph learning models on node classification, link prediction, and molecular property tasks.
Building robotic systems? Find benchmarks for manipulation, navigation, and simulation-to-reality transfer.
Working with voice and audio? Evaluate speech-to-text accuracy, voice synthesis quality, and speaker identification performance.
Need to test model robustness? Benchmark resilience against adversarial attacks and evaluate defense mechanisms.
Predicting future trends or detecting anomalies? Benchmark forecasting accuracy and pattern recognition in sequential data.
Measuring autonomous AI capabilities? METR benchmarks track time horizon, multi-step reasoning, and sustained task performance - key metrics for AGI progress.
Processing general audio signals? Test your models on sound classification, event detection, music analysis, and source separation.
Building knowledge systems? Evaluate graph completion, relation extraction, and entity linking performance.
Improving learning efficiency? Test self-supervised, few-shot, transfer, and continual learning approaches.
Combining vision and language? Evaluate image captioning, visual QA, text-to-image generation, and cross-modal retrieval models.
Processing and understanding text? Evaluate your models on language understanding, generation, translation, and information extraction benchmarks.
How to Navigate
1. Choose an Area
Start with a research domain like Computer Vision or NLP that matches your problem space.
2. Select a Task
Find the specific problem you are solving, like OCR, Text Classification, or Object Detection.
3. Pick a Dataset
Choose a benchmark dataset to evaluate your model and compare against state-of-the-art results.