dart is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for dart.
Factspotter is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | FactT5B | verified | 97.6 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 02 | FactJointGT | verified | 97.25 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 03 | T5B Baseline | verified | 96.65 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 04 | JointGT Baseline | verified | 95.86 | 2023 | Paper ↗Code ↗ | Looks wrong? |
Bleu is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | T5B Baseline | verified | 48.47 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 02 | FactT5B | verified | 48.37 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 03 | JointGT Baseline | verified | 47.51 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 04 | FactJointGT | verified | 47.39 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 05 | HTLM (fine-tuning) | verified | 47.2 | 2021 | Paper ↗ | Looks wrong? |
| 06 | GPT-2-Large (fine-tuning) | verified | 47 | 2021 | Paper ↗ | Looks wrong? |
Bert is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | FactT5B | verified | 0.95 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 02 | T5B Baseline | verified | 0.95 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 03 | JointGT Baseline | verified | 0.95 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 04 | FactJointGT | verified | 0.95 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 05 | HTLM (fine-tuning) | verified | 0.94 | 2021 | Paper ↗ | Looks wrong? |
| 06 | GPT-2-Large (fine-tuning) | verified | 0.94 | 2021 | Paper ↗ | Looks wrong? |
Bleurt is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | T5B Baseline | verified | 0.67 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 02 | FactT5B | verified | 0.67 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 03 | JointGT Baseline | verified | 0.67 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 04 | FactJointGT | verified | 0.67 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 05 | HTLM (fine-tuning) | verified | 0.40 | 2021 | Paper ↗ | Looks wrong? |
| 06 | GPT-2-Large (fine-tuning) | verified | 0.40 | 2021 | Paper ↗ | Looks wrong? |
Mover is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | GPT-2-Large (fine-tuning) | verified | 0.51 | 2021 | Paper ↗ | Looks wrong? |
| 02 | HTLM (fine-tuning) | verified | 0.51 | 2021 | Paper ↗ | Looks wrong? |
Ter is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | GPT-2-Large (fine-tuning) | verified | 0.46 | 2021 | Paper ↗ | Looks wrong? |
| 02 | HTLM (fine-tuning) | verified | 0.44 | 2021 | Paper ↗ | Looks wrong? |
Meteor is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | T5B Baseline | verified | 0.41 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 02 | FactT5B | verified | 0.41 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 03 | JointGT Baseline | verified | 0.40 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 04 | FactJointGT | verified | 0.40 | 2023 | Paper ↗Code ↗ | Looks wrong? |
| 05 | HTLM (fine-tuning) | verified | 0.39 | 2021 | Paper ↗ | Looks wrong? |
| 06 | GPT-2-Large (fine-tuning) | verified | 0.39 | 2021 | Paper ↗ | Looks wrong? |