Codesota · Benchmark · XNLIHome/Leaderboards/XNLI
Unknown

XNLI.

Cross-lingual natural language inference across 15 languages

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Only 3 models on this benchmark
Help build the community leaderboard — submit your model results.
Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Accuracy

Accuracy is the reported evaluation metric for XNLI. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Accuracyverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01GPT-4
GPT-4 average XNLI accuracy (15 languages). From GPT-4 Technical Report and cross-lingual evaluation studies.
verified87.42023Source ↗Looks wrong?
02XLM-RoBERTa-large
XLM-RoBERTa-large XNLI avg accuracy (15 languages). State-of-the-art at publication. From Table 3.
verified83.62019Source ↗Looks wrong?
03mDeBERTa-v3-base
mDeBERTa-v3-base average XNLI accuracy across 15 languages. From HF model card.
verified80.82022Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards