Codesota · Benchmark · WMT'23Home/Leaderboards/Language & Knowledge/Machine Translation/WMT'23
Unknown

WMT'23.

State-of-the-art machine translation evaluation from WMT 2023 shared task

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Only 4 models on this benchmark
Help build the community leaderboard — submit your model results.
Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Comet

Comet is the reported evaluation metric for WMT'23. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Cometverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01GPT-4
GPT-4 COMET-22 score on WMT23 en→de test set. From WMT23 General MT findings.
verified84.12023Source ↗Looks wrong?
02Google Translate
Google Translate (ONLINE-B) COMET-22 on WMT23 en→de. Consistently top-tier online system.
verified83.82023Source ↗Looks wrong?
03DeepL
DeepL (ONLINE-W) COMET-22 on WMT23 en→de. From WMT23 findings paper.
verified83.52023Source ↗Looks wrong?
04NLLB-3.3B
NLLB-3.3B COMET-22 on WMT23 en→de. Open-source strong baseline.
verified81.62023Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Machine Translation
WMT'23 Leaderboard | CodeSOTA | CodeSOTA