300K news articles with multi-sentence summaries. Standard benchmark for abstractive summarization.
Rouge 1 is the reported evaluation metric for CNN/DailyMail. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | BRIO | verified | 47.78 | 2022 | Paper ↗ | Looks wrong? |
| 02 | GPT-4o | verified | 46.3 | 2023 | Paper ↗Source ↗ | Looks wrong? |
| 03 | Gemini 1.5 Pro | verified | 45.8 | 2024 | Paper ↗ | Looks wrong? |
| 04 | Llama 3.1 405B | verified | 45.1 | 2024 | Paper ↗ | Looks wrong? |
| 05 | Qwen2 72B | verified | 44.7 | 2024 | Paper ↗ | Looks wrong? |
| 06 | PEGASUS-Large | verified | 44.17 | 2019 | Paper ↗ | Looks wrong? |
Rouge L is the reported evaluation metric for CNN/DailyMail. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | BRIO | verified | 44.57 | 2022 | Paper ↗ | Looks wrong? |
| 02 | GPT-4o | verified | 43.4 | 2023 | Paper ↗Source ↗ | Looks wrong? |
| 03 | Gemini 1.5 Pro | verified | 43 | 2024 | Paper ↗ | Looks wrong? |
| 04 | Llama 3.1 405B | verified | 42.3 | 2024 | Paper ↗ | Looks wrong? |
| 05 | Qwen2 72B | verified | 41.8 | 2024 | Paper ↗ | Looks wrong? |
| 06 | PEGASUS-Large | verified | 41.11 | 2019 | Paper ↗ | Looks wrong? |
| 07 | BART | unverified | 40.9 | 2019 | Paper ↗Code ↗ | Looks wrong? |
Rouge 2 is the reported evaluation metric for CNN/DailyMail. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | BRIO | verified | 23.55 | 2022 | Paper ↗ | Looks wrong? |
| 02 | GPT-4o | verified | 22.1 | 2023 | Paper ↗Source ↗ | Looks wrong? |
| 03 | PEGASUS-Large | verified | 21.47 | 2019 | Paper ↗ | Looks wrong? |