Codesota · Models · Llama 3.1 405BMeta13 results · 12 benchmarks
Model card

Llama 3.1 405B.

Metaopen-source

Meta Llama 3.1, 405B parameter instruct variant. Released July 2024.

§ 02 · Benchmarks

Every benchmark Llama 3.1 405B has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01CNN/DailyMailNatural Language Processing · Text Summarizationrouge-145.1%#4/62024-07-31source ↗
02CNN/DailyMailNatural Language Processing · Text Summarizationrouge-l42.3%#4/72024-07-31source ↗
03CoNLL-2003Natural Language Processing · Named Entity Recognitionf190.6%#4/72024-07-31source ↗
04SNLINatural Language Processing · Natural Language Inferenceaccuracy91.2%#5/82024-07-31source ↗
05HellaSwagReasoning · Commonsense Reasoningaccuracy89.0%#5/17source ↗
06SuperGLUENatural Language Processing · Text classificationaverage-score86.7%#6/72024-07-31source ↗
07ARC-ChallengeReasoning · Commonsense Reasoningaccuracy96.9%#6/10source ↗
08BIG-Bench HardReasoning · Multi-step Reasoningaccuracy85.9%#7/11source ↗
09SQuAD v2.0Natural Language Processing · Question Answeringf188.7%#15/262024-07-31source ↗
10HumanEvalComputer Code · Code Generationpass@189.0%#20/42source ↗
11MMLUReasoning · Commonsense Reasoningaccuracy88.6%#22/64source ↗
12MATHReasoning · Mathematical Reasoningaccuracy73.8%#31/46source ↗
13GPQA DiamondReasoning · Multi-step Reasoningaccuracy50.7%#62/74source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Llama 3.1 405B actually performs.

Natural Language Processing
1
benchmark
avg rank #6.0
Natural Language Processing
4
benchmarks
avg rank #6.4
Computer Code
1
benchmark
avg rank #20.0
Reasoning
6
benchmarks
avg rank #22.2
§ 04 · Papers

1 paper with results for Llama 3.1 405B.

  1. 2024-07-31· Natural Language Processing· 6 results

    The Llama 3 Herd of Models

§ 05 · Related models

Other Meta models scored on Codesota.

Llama 3 70B
9 results
Llama 3 (405B, Instruct)
7 results
Llama 4 Maverick
400B total / 17B active (128 experts) params · 7 results
Llama 3.1 70B
4 results
Code Llama 34B
Unknown params · 2 results
ConvNeXt V2 Huge
650M params · 2 results
DeiT-B Distilled
86M params · 2 results
Muse Spark
2 results
§ 06 · Sources & freshness

Where these numbers come from.

arxiv
6
results
openai-simple-evals
4
results
meta-modelcard
2
results
llm-stats-bbh
1
result
9 of 13 rows marked verified.