Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Models · LLaMA-65B9 results · 9 benchmarks
Model card

LLaMA-65B.

unknown1 current SOTA
§ 02 · Benchmarks

Every benchmark LLaMA-65B has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01Natural QuestionsNatural Language Processing · Question Answeringaccuracy39.9%#1/5source ↗
02TriviaQANatural Language Processing · Question Answeringaccuracy73.0%#2/4source ↗
03WinoGrandeReasoning · Commonsense Reasoningaccuracy77.0%#7/13source ↗
04HellaSwagReasoning · Commonsense Reasoningaccuracy84.2%#8/17source ↗
05MBPP+Computer Code · Code Generationpass-137.7%#9/9source ↗
06HumanEval+Computer Code · Code Generationpass-123.7%#10/12source ↗
07GSM8KReasoning · Mathematical Reasoningaccuracy69.7%#41/48source ↗
08MATHReasoning · Mathematical Reasoningaccuracy20.5%#45/46source ↗
09MMLUReasoning · Commonsense Reasoningaccuracy63.4%#58/64source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where LLaMA-65B actually performs.

Natural Language Processing
2
benchmarks
avg rank #1.5 · 1 SOTA
Computer Code
2
benchmarks
avg rank #9.5
Reasoning
5
benchmarks
avg rank #31.8
§ 04 · Papers

1 paper with results for LLaMA-65B.

  1. 2023-02-27· 9 results

    LLaMA: Open and Efficient Foundation Language Models

§ 06 · Sources & freshness

Where these numbers come from.

pwc-dump
9
results
0 of 9 rows marked verified.