Codesota · Benchmark · dartHome/Leaderboards/Vision & Documents/Document OCR/dart

Unknown

dart.

dart is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for dart.

Paper ↗Leaderboard ↓

§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Factspotter

Factspotter is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Factspotterverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	FactT5B From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	97.6	2023	Paper ↗Code ↗	Looks wrong?
02	FactJointGT From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	97.25	2023	Paper ↗Code ↗	Looks wrong?
03	T5B Baseline From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	96.65	2023	Paper ↗Code ↗	Looks wrong?
04	JointGT Baseline From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	95.86	2023	Paper ↗Code ↗	Looks wrong?

Bleu

Bleu is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Bleuverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	T5B Baseline From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	48.47	2023	Paper ↗Code ↗	Looks wrong?
02	FactT5B From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	48.37	2023	Paper ↗Code ↗	Looks wrong?
03	JointGT Baseline From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	47.51	2023	Paper ↗Code ↗	Looks wrong?
04	FactJointGT From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	47.39	2023	Paper ↗Code ↗	Looks wrong?
05	HTLM (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	47.2	2021	Paper ↗	Looks wrong?
06	GPT-2-Large (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	47	2021	Paper ↗	Looks wrong?

Bert

Bert is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Bertverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	FactT5B From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.95	2023	Paper ↗Code ↗	Looks wrong?
02	T5B Baseline From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.95	2023	Paper ↗Code ↗	Looks wrong?
03	JointGT Baseline From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.95	2023	Paper ↗Code ↗	Looks wrong?
04	FactJointGT From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.95	2023	Paper ↗Code ↗	Looks wrong?
05	HTLM (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	0.94	2021	Paper ↗	Looks wrong?
06	GPT-2-Large (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	0.94	2021	Paper ↗	Looks wrong?

Bleurt

Bleurt is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Bleurtverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	T5B Baseline From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.67	2023	Paper ↗Code ↗	Looks wrong?
02	FactT5B From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.67	2023	Paper ↗Code ↗	Looks wrong?
03	JointGT Baseline From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.67	2023	Paper ↗Code ↗	Looks wrong?
04	FactJointGT From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.67	2023	Paper ↗Code ↗	Looks wrong?
05	HTLM (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	0.40	2021	Paper ↗	Looks wrong?
06	GPT-2-Large (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	0.40	2021	Paper ↗	Looks wrong?

Mover

Mover is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Moververifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	GPT-2-Large (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	0.51	2021	Paper ↗	Looks wrong?
02	HTLM (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	0.51	2021	Paper ↗	Looks wrong?

Ter

Ter is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Terverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	GPT-2-Large (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	0.46	2021	Paper ↗	Looks wrong?
02	HTLM (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	0.44	2021	Paper ↗	Looks wrong?

Meteor

Meteor is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Meteorverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	T5B Baseline From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.41	2023	Paper ↗Code ↗	Looks wrong?
02	FactT5B From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.41	2023	Paper ↗Code ↗	Looks wrong?
03	JointGT Baseline From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.40	2023	Paper ↗Code ↗	Looks wrong?
04	FactJointGT From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation	verified	0.40	2023	Paper ↗Code ↗	Looks wrong?
05	HTLM (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	0.39	2021	Paper ↗	Looks wrong?
06	GPT-2-Large (fine-tuning) From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models	verified	0.39	2021	Paper ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Document OCR