Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Benchmark · HellaSwagHome/Leaderboards/HellaSwag
Unknown

HellaSwag.

70K sentence completion problems testing commonsense natural language inference.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

accuracy

Accuracy is the reported evaluation metric for HellaSwag. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksEdit
01gpt-4o
Commonsense NLI. Models now exceed human performance (95.6%).
paper95.32025Source ↗Edit result
02Gemini 1.5 Prounverified92.52025Source ↗Edit result
03gemini-15-propaper92.52025Source ↗Edit result
04Step-3.5-Flash Baseunverified90.22026Paper ↗Code ↗Edit result
05Trinity Large Base (5-shot)unverified90.112026Paper ↗Code ↗Edit result
06Llama 3.1 405B
Llama 3.1 405B Instruct. Official Meta model card evaluation.
verified892026Source ↗Edit result
07claude-35-sonnetpaper892025Source ↗Edit result
08Claude 3.5 Sonnetunverified892025Source ↗Edit result
09Llama 3 70Bunverified882025Source ↗Edit result
10llama-3-70bpaper882025Source ↗Edit result
11LLaMA-65Bunverified84.22023Paper ↗Code ↗Edit result
12Chameleon 34Bunverified82.72024Paper ↗Code ↗Edit result
13BLT-Entropy 8Bunverified80.62024Paper ↗Code ↗Edit result
14Apertus-70B-Instructunverified78.12025Paper ↗Code ↗Edit result
15Heliumunverified76.32024Paper ↗Code ↗Edit result
16SmoLM2 (1.7B)unverified68.72025Paper ↗Code ↗Edit result
17BitNet b1.58 2B4Tunverified68.442025Paper ↗Code ↗Edit result
18Apertus-70Bunverified642025Paper ↗Code ↗Edit result
19HRM-Text-1Bunverified63.42026Paper ↗Code ↗Edit result
20OLMo-2-7B-1124 (olmOCR-peS2o)unverified62.62025Paper ↗Code ↗Edit result
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards