Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Models · Gemma 4 31BGoogle6 results · 5 benchmarks
Model card

Gemma 4 31B.

GoogleLarge language model

Added from Papers with Code MMLU-Pro refresh on 2026-05-19.

§ 02 · Benchmarks

Every benchmark Gemma 4 31B has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01MMMU-ProMultimodal · Visual Question Answeringaccuracy76.9%#7/31source ↗
02LiveCodeBenchComputer Code · Code Generationpass-180.0%#11/24source ↗
03HLEReasoning · Multi-step Reasoningaccuracy26.5%#12/36source ↗
04MMLU-ProReasoning · Commonsense Reasoningaccuracy85.2%#21/732026-04-02source ↗
05MMLU-ProReasoning · Commonsense Reasoningaccuracy85.2%#21/73source ↗
06GPQA DiamondReasoning · Multi-step Reasoningaccuracy84.3%#23/74source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Gemma 4 31B actually performs.

Multimodal
1
benchmark
avg rank #7.0
Computer Code
1
benchmark
avg rank #11.0
Reasoning
3
benchmarks
avg rank #19.3
§ 04 · Papers

1 paper with results for Gemma 4 31B.

  1. 2026-04-02· 5 results

    Gemma 4: Byte for byte, the most capable open models

§ 05 · Related models

Other Google models scored on Codesota.

Gemini 2.5 Pro
16 results · 2 SOTA
Gemini 1.5 Pro
14 results · 1 SOTA
Gemini 3 Pro
Undisclosed params · 12 results · 1 SOTA
ViT-H/14
632M params · 2 results · 1 SOTA
CoCa (finetuned)
2.1B params · 1 result · 1 SOTA
Gemini 2.0 Flash
1 result · 1 SOTA
Noise2Music
Unknown params · 1 result · 1 SOTA
Gemini 3 Flash
Undisclosed params · 6 results
§ 06 · Sources & freshness

Where these numbers come from.

pwc-dump
5
results
paperswithcode
1
result
0 of 6 rows marked verified.