Codesota · Benchmark · SNLIHome/Leaderboards/Language & Knowledge/Natural Language Inference/SNLI
Unknown

SNLI.

570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

accuracy

Accuracy is the reported evaluation metric for SNLI. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01GPT-4o
GPT-4o few-shot NLI.
verified92.62023Paper ↗Source ↗Looks wrong?
02DeBERTa-v3-large
DeBERTa-v3-Large fine-tuned.
verified92.22021Paper ↗Source ↗Looks wrong?
03Gemini Ultra
Gemini Ultra few-shot. Source: Gemini technical report.
verified91.92023Paper ↗Looks wrong?
04Claude 3.5 Sonnet
Claude 3.5 Sonnet 5-shot NLI evaluation.
verified91.82024Paper ↗Looks wrong?
05Llama 3.1 405B
Llama 3.1 405B Instruct few-shot. Source: Llama 3 paper.
verified91.22024Paper ↗Looks wrong?
06Qwen2 72B
Qwen2 72B Instruct. Source: Qwen2 technical report.
verified90.12024Paper ↗Looks wrong?
07Llama 3 70B
Llama 3 70B Instruct. Source: Llama 3 paper.
verified89.72024Paper ↗Looks wrong?
08Mistral 7B
Mistral 7B 5-shot. Source: Mistral 7B paper.
verified85.62023Paper ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Natural Language Inference