Codesota · Models · Grok 2xAI4 results · 4 benchmarks
Model card

Grok 2.

xAIapi
§ 01 · Benchmarks

Every benchmark Grok 2 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01HumanEvalComputer Code · Code Generationpass@188.4%#22/42source ↗
02GPQAReasoning · Multi-step Reasoningaccuracy56.0%#25/33source ↗
03MMLUReasoning · Commonsense Reasoningaccuracy87.5%#26/41source ↗
04MATHReasoning · Mathematical Reasoningaccuracy76.1%#27/34source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where Grok 2 actually performs.

Computer Code
1
benchmark
avg rank #22.0
Reasoning
3
benchmarks
avg rank #26.0
§ 04 · Related models

Other xAI models scored on Codesota.

Grok 4
4 results
Grok 3
1 result
Grok Code Fast 1
1 result
Grok-2-1212
0 results
Grok-3-Beta
0 results
Grok-3-Mini-Beta
0 results
Grok-4-Fast
0 results
Grok-4.1-Fast
0 results
§ 05 · Sources & freshness

Where these numbers come from.

openai-simple-evals
4
results
0 of 4 rows marked verified.