Codesota · Models · Llama-4-MaverickMeta14 results · 8 benchmarks
Model card

Llama-4-Maverick.

Metaopen-source400B total / 17B active (128 experts) paramsMixture-of-Experts Transformer

Meta Llama 4 Maverick. 128-expert MoE. Released April 2025. Context: 1M tokens.

§ 01 · Benchmarks

Every benchmark Llama-4-Maverick has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01ARC-ChallengeReasoning · Commonsense Reasoningaccuracy97.4%#3/10source ↗
02GSM8KReasoning · Mathematical Reasoningaccuracy98.7%#7/32source ↗
03MBPPComputer Code · Code Generationpass@177.6%#12/19source ↗
04MMLUReasoning · Commonsense Reasoningaccuracy89.4%#18/41source ↗
05GPQAReasoning · Multi-step Reasoningaccuracy69.8%#19/33source ↗
06MATHReasoning · Mathematical Reasoningaccuracy89.4%#19/34source ↗
07LiveCodeBenchComputer Code · Code Generationpass@143.4%#23/30source ↗
08PLCCNatural Language Processing · Polish Cultural Competencyhistory76.0%#73/165source ↗
09PLCCNatural Language Processing · Polish Cultural Competencygrammar59.0%#82/165source ↗
10PLCCNatural Language Processing · Polish Cultural Competencygeography71.0%#88/165source ↗
11PLCCNatural Language Processing · Polish Cultural Competencyaverage58.2%#95/165source ↗
12PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment46.0%#97/165source ↗
13PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition52.0%#102/165source ↗
14PLCCNatural Language Processing · Polish Cultural Competencyvocabulary45.0%#105/165source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where Llama-4-Maverick actually performs.

Reasoning
5
benchmarks
avg rank #13.2
Computer Code
2
benchmarks
avg rank #17.5
Natural Language Processing
1
benchmark
avg rank #91.7
§ 04 · Related models

Other Meta models scored on Codesota.

DeiT-B Distilled
86M params · 2 results · 1 SOTA
Llama 3 70B
8 results
Llama 3.1 405B
6 results
Llama 3.1 70B
4 results
Code Llama 34B
Unknown params · 2 results
ConvNeXt V2 Huge
650M params · 2 results
CodeLlama 70B
70B params · 1 result
ConvNeXt V2 Base
89M params · 1 result
§ 05 · Sources & freshness

Where these numbers come from.

sdadas/PLCC
7
results
meta-blog
5
results
meta-model-card
2
results
14 of 14 rows marked verified.