Codesota · Models · Llama 4 MaverickMeta15 results · 9 benchmarks
Model card

Llama 4 Maverick.

Metaopen-source400B total / 17B active (128 experts) paramsMixture-of-Experts Transformer

Meta Llama 4 Maverick. 128-expert MoE. Released April 2025. Context: 1M tokens.

§ 02 · Benchmarks

Every benchmark Llama 4 Maverick has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01ARC-ChallengeReasoning · Commonsense Reasoningaccuracy97.4%#3/10source ↗
02GSM8KReasoning · Mathematical Reasoningaccuracy98.7%#8/48source ↗
03MBPPComputer Code · Code Generationpass@177.6%#12/19source ↗
04MATHReasoning · Mathematical Reasoningaccuracy89.4%#19/46source ↗
05MMLUReasoning · Commonsense Reasoningaccuracy89.4%#19/64source ↗
06LiveCodeBenchComputer Code · Code Generationpass@143.4%#23/30source ↗
07GPQA DiamondReasoning · Multi-step Reasoningaccuracy69.8%#51/74source ↗
08HLEReasoning · Multi-step Reasoningaccuracy5.7%#64/74source ↗
09PLCCNatural Language Processing · Polish Cultural Competencyhistory76.0%#73/165source ↗
10PLCCNatural Language Processing · Polish Cultural Competencygrammar59.0%#82/165source ↗
11PLCCNatural Language Processing · Polish Cultural Competencygeography71.0%#88/165source ↗
12PLCCNatural Language Processing · Polish Cultural Competencyaverage58.2%#95/165source ↗
13PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment46.0%#97/165source ↗
14PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition52.0%#102/165source ↗
15PLCCNatural Language Processing · Polish Cultural Competencyvocabulary45.0%#105/165source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Llama 4 Maverick actually performs.

Computer Code
2
benchmarks
avg rank #17.5
Reasoning
6
benchmarks
avg rank #27.3
Natural Language Processing
1
benchmark
avg rank #91.7
§ 05 · Related models

Other Meta models scored on Codesota.

Llama 3 70B
9 results
Llama 3 (405B, Instruct)
7 results
Llama 3.1 405B
7 results
Llama 3.1 70B
4 results
Code Llama 34B
Unknown params · 2 results
ConvNeXt V2 Huge
650M params · 2 results
DeiT-B Distilled
86M params · 2 results
Muse Spark
2 results
§ 06 · Sources & freshness

Where these numbers come from.

sdadas/PLCC
7
results
meta-blog
5
results
meta-model-card
2
results
scale-hle-official
1
result
15 of 15 rows marked verified.