Codesota · Benchmark · dartHome/Leaderboards/Vision & Documents/Document OCR/dart
Unknown

dart.

dart is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for dart.

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Factspotter

Factspotter is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Factspotterverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01FactT5B
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified97.62023Paper ↗Code ↗Looks wrong?
02FactJointGT
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified97.252023Paper ↗Code ↗Looks wrong?
03T5B Baseline
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified96.652023Paper ↗Code ↗Looks wrong?
04JointGT Baseline
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified95.862023Paper ↗Code ↗Looks wrong?

Bleu

Bleu is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Bleuverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01T5B Baseline
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified48.472023Paper ↗Code ↗Looks wrong?
02FactT5B
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified48.372023Paper ↗Code ↗Looks wrong?
03JointGT Baseline
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified47.512023Paper ↗Code ↗Looks wrong?
04FactJointGT
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified47.392023Paper ↗Code ↗Looks wrong?
05HTLM (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified47.22021Paper ↗Looks wrong?
06GPT-2-Large (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified472021Paper ↗Looks wrong?

Bert

Bert is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Bertverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01FactT5B
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.952023Paper ↗Code ↗Looks wrong?
02T5B Baseline
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.952023Paper ↗Code ↗Looks wrong?
03JointGT Baseline
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.952023Paper ↗Code ↗Looks wrong?
04FactJointGT
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.952023Paper ↗Code ↗Looks wrong?
05HTLM (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified0.942021Paper ↗Looks wrong?
06GPT-2-Large (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified0.942021Paper ↗Looks wrong?

Bleurt

Bleurt is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Bleurtverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01T5B Baseline
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.672023Paper ↗Code ↗Looks wrong?
02FactT5B
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.672023Paper ↗Code ↗Looks wrong?
03JointGT Baseline
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.672023Paper ↗Code ↗Looks wrong?
04FactJointGT
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.672023Paper ↗Code ↗Looks wrong?
05HTLM (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified0.402021Paper ↗Looks wrong?
06GPT-2-Large (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified0.402021Paper ↗Looks wrong?

Mover

Mover is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Moververifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01GPT-2-Large (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified0.512021Paper ↗Looks wrong?
02HTLM (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified0.512021Paper ↗Looks wrong?

Ter

Ter is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Terverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01GPT-2-Large (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified0.462021Paper ↗Looks wrong?
02HTLM (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified0.442021Paper ↗Looks wrong?

Meteor

Meteor is the reported evaluation metric for dart. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Meteorverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01T5B Baseline
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.412023Paper ↗Code ↗Looks wrong?
02FactT5B
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.412023Paper ↗Code ↗Looks wrong?
03JointGT Baseline
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.402023Paper ↗Code ↗Looks wrong?
04FactJointGT
From paper: FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
verified0.402023Paper ↗Code ↗Looks wrong?
05HTLM (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified0.392021Paper ↗Looks wrong?
06GPT-2-Large (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified0.392021Paper ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Document OCR