ML Benchmarks & Leaderboards
Compare AI models across standardized benchmarks. Find the best models for your use case.
Chatbot Quality
Chatbot Quality Monitoring
Purpose-driven metrics for monitoring chatbot quality. Response accuracy, hallucination rates, and domain-specific evaluation.
MTEB
Massive Text Embedding Benchmark
Compare embedding models across retrieval, classification, clustering, and semantic similarity.
BEIR
Benchmarking Information Retrieval
Zero-shot evaluation of retrieval models across 18 diverse datasets.
ImageNet
ImageNet Classification
The standard benchmark for image classification models.
COCO
Common Objects in Context
Object detection, segmentation, and captioning benchmarks.
LibriSpeech
LibriSpeech ASR
Standard benchmark for automatic speech recognition (ASR) models.
MMLU
Massive Multitask Language Understanding
Test LLMs across 57 academic subjects from STEM to humanities.
Missing a benchmark?
Let us know which benchmarks you'd like to see on CodeSOTA. We prioritize based on user demand.
Request a Benchmark