Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Models · Llama 2 70B (5-shot)6 results · 6 benchmarks
Model card

Llama 2 70B (5-shot).

unknown1 current SOTA
§ 02 · Benchmarks

Every benchmark Llama 2 70B (5-shot) has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01TriviaQANatural Language Processing · Question Answeringaccuracy85.0%#1/4source ↗
02Natural QuestionsNatural Language Processing · Question Answeringaccuracy33.0%#2/5source ↗
03HumanEval+Computer Code · Code Generationpass-129.9%#9/12source ↗
04BIG-Bench HardReasoning · Multi-step Reasoningaccuracy51.2%#10/11source ↗
05GSM8KReasoning · Mathematical Reasoningaccuracy56.8%#45/48source ↗
06MMLUReasoning · Commonsense Reasoningaccuracy68.9%#55/64source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where Llama 2 70B (5-shot) actually performs.

Natural Language Processing
2
benchmarks
avg rank #1.5 · 1 SOTA
Computer Code
1
benchmark
avg rank #9.0
Reasoning
3
benchmarks
avg rank #36.7
§ 04 · Papers

1 paper with results for Llama 2 70B (5-shot).

  1. 2023-07-18· 6 results

    Llama 2: Open Foundation and Fine-Tuned Chat Models

§ 06 · Sources & freshness

Where these numbers come from.

pwc-dump
6
results
0 of 6 rows marked verified.