Codesota · Models · Grok 4xAI15 results · 6 benchmarks
Model card

Grok 4.

xAIapi
§ 02 · Benchmarks

Every benchmark Grok 4 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01PLCCNatural Language Processing · Polish Cultural Competencyhistory94.0%#3/165source ↗
02PLCCNatural Language Processing · Polish Cultural Competencygrammar90.0%#3/165source ↗
03LiveCodeBenchComputer Code · Code Generationpass@179.0%#4/30source ↗
04PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition95.0%#5/165source ↗
05PLCCNatural Language Processing · Polish Cultural Competencyaverage90.5%#7/165source ↗
06React Native EvalsMobile Development · React Native Code Generationanimation-satisfaction59.4%#8/10source ↗
07React Native EvalsMobile Development · React Native Code Generationasync-state-satisfaction73.8%#9/10source ↗
08React Native EvalsMobile Development · React Native Code Generationnavigation-satisfaction84.4%#9/10source ↗
09React Native EvalsMobile Development · React Native Code Generationrequirement-satisfaction70.1%#9/10source ↗
10GPQA DiamondReasoning · Multi-step Reasoningaccuracy88.0%#10/74source ↗
11PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment86.0%#10/165source ↗
12PLCCNatural Language Processing · Polish Cultural Competencygeography94.0%#14/165source ↗
13PLCCNatural Language Processing · Polish Cultural Competencyvocabulary84.0%#18/165source ↗
14HLEReasoning · Multi-step Reasoningaccuracy24.5%#27/74unverified
15MMLUReasoning · Commonsense Reasoningaccuracy86.6%#38/64source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Grok 4 actually performs.

Computer Code
1
benchmark
avg rank #4.0
Natural Language Processing
1
benchmark
avg rank #8.6
Mobile Development
1
benchmark
avg rank #8.8
Reasoning
3
benchmarks
avg rank #25.0
§ 05 · Related models

Other xAI models scored on Codesota.

Grok 2
4 results
Grok 3
1 result
Grok Code Fast 1
1 result
Grok-2-1212
0 results
Grok-3-Beta
0 results
Grok-3-Mini-Beta
0 results
Grok-4-Fast
0 results
Grok-4.1-Fast
0 results
§ 06 · Sources & freshness

Where these numbers come from.

sdadas/PLCC
7
results
Callstack Incubator
4
results
xai-grok-4-announcement
2
results
editorial
1
result
artificial-analysis
1
result
11 of 15 rows marked verified.