ML Benchmarks & Leaderboards

Compare AI models across standardized benchmarks. Find the best models for your use case.

2 live5 coming soon

Chatbot Quality Monitoring

Purpose-driven metrics for monitoring chatbot quality. Response accuracy, hallucination rates, and domain-specific evaluation.

Massive Text Embedding Benchmark

Compare embedding models across retrieval, classification, clustering, and semantic similarity.

Benchmarking Information Retrieval

Zero-shot evaluation of retrieval models across 18 diverse datasets.

ImageNet Classification

The standard benchmark for image classification models.

Common Objects in Context

Object detection, segmentation, and captioning benchmarks.

LibriSpeech ASR

Standard benchmark for automatic speech recognition (ASR) models.

Massive Multitask Language Understanding

Test LLMs across 57 academic subjects from STEM to humanities.

Missing a benchmark?

Let us know which benchmarks you'd like to see on CodeSOTA. We prioritize based on user demand.