Codesota · Models · Llama 3 70BMeta11 results · 11 benchmarks
Model card

Llama 3 70B.

Metaopen-sourceLLM

Meta Llama 3, 70B parameter instruct variant. Released April 2024.

§ 02 · Benchmarks

Every benchmark Llama 3 70B has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01CommonsenseQAReasoning · Commonsense Reasoningaccuracy80.9%#3/5source ↗
02MAWPSReasoning · Arithmetic Reasoningaccuracy94.1%#3/3source ↗
03SVAMPReasoning · Arithmetic Reasoningaccuracy89.5%#3/3source ↗
04WinoGrandeReasoning · Commonsense Reasoningaccuracy85.3%#3/13source ↗
05CoNLL-2003Natural Language Processing · Named Entity Recognitionf189.3%#6/72024-07-31source ↗
06SNLINatural Language Processing · Natural Language Inferenceaccuracy89.7%#7/82024-07-31source ↗
07HellaSwagReasoning · Commonsense Reasoningaccuracy88.0%#7/17source ↗
08ARC-ChallengeReasoning · Commonsense Reasoningaccuracy93.0%#10/10source ↗
09SQuAD v2.0Natural Language Processing · Question Answeringf185.3%#23/262024-07-31source ↗
10GSM8KReasoning · Mathematical Reasoningaccuracy93.0%#30/48source ↗
11HumanEvalComputer Code · Code Generationpass@181.7%#34/42source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Llama 3 70B actually performs.

Reasoning
7
benchmarks
avg rank #8.4
Natural Language Processing
3
benchmarks
avg rank #12.0
Computer Code
1
benchmark
avg rank #34.0
§ 04 · Papers

1 paper with results for Llama 3 70B.

  1. 2024-07-31· Natural Language Processing· 3 results

    The Llama 3 Herd of Models

§ 05 · Related models

Other Meta models scored on Codesota.

Llama 3 (405B, Instruct)
7 results
Llama 3.1 405B
7 results
Llama 4 Maverick
400B total / 17B active (128 experts) params · 7 results
Llama 3.1 70B
4 results
Code Llama 34B
Unknown params · 2 results
ConvNeXt V2 Huge
650M params · 2 results
DeiT-B Distilled
86M params · 2 results
Muse Spark
2 results
§ 06 · Sources & freshness

Where these numbers come from.

meta-blog
7
results
arxiv
3
results
openai-simple-evals
1
result
3 of 11 rows marked verified.