State-of-the-art machine translation evaluation from WMT 2023 shared task
Comet is the reported evaluation metric for WMT'23. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | GPT-4 | verified | 84.1 | 2023 | Source ↗ | Looks wrong? |
| 02 | Google Translate | verified | 83.8 | 2023 | Source ↗ | Looks wrong? |
| 03 | DeepL | verified | 83.5 | 2023 | Source ↗ | Looks wrong? |
| 04 | NLLB-3.3B | verified | 81.6 | 2023 | Source ↗ | Looks wrong? |