Codesota · Benchmark · e2eHome/Leaderboards/Vision & Documents/Document OCR/e2e
Unknown

e2e.

e2e is a state-of-the-art machine learning benchmark indexed on Codesota. This page tracks published model results, top scores per metric, and the SOTA timeline for e2e.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Rouge L

Rouge L is the reported evaluation metric for e2e. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Rouge Lverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01GPT-2-Large (prefix-tuning)
GPT-2 Large prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified71.72021Paper ↗Looks wrong?
02GPT-2-Medium (prefix-tuning)
GPT-2 Medium prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified71.42021Paper ↗Looks wrong?
03HTLM (prefix-tuning)
HTLM prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified71.22021Paper ↗Looks wrong?
04GPT-2-Medium (fine-tuning)
GPT-2 Medium fine-tuning on E2E NLG. From Table 3 in HTLM paper.
verified712021Paper ↗Looks wrong?
05HTLM (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified70.82021Paper ↗Looks wrong?
06GPT-2-Large (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified69.92021Paper ↗Looks wrong?
07T5-base (STSM)
T5-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified68.972024Paper ↗Looks wrong?
08BART-base (STSM)
BART-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified68.762024Paper ↗Looks wrong?
09FLAN-T5-base (STSM)
FLAN-T5-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified67.852024Paper ↗Looks wrong?

Bleu

Bleu is the reported evaluation metric for e2e. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Bleuverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01GPT-2-Large (prefix-tuning)
GPT-2 Large prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified70.32021Paper ↗Looks wrong?
02HTLM (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified70.32021Paper ↗Looks wrong?
03HTLM (prefix-tuning)
HTLM prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified70.12021Paper ↗Looks wrong?
04GPT-2-Medium (prefix-tuning)
GPT-2 Medium prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified69.72021Paper ↗Looks wrong?
05GPT-2-Large (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified68.52021Paper ↗Looks wrong?
06GPT-2-Medium (fine-tuning)
GPT-2 Medium fine-tuning on E2E NLG. From Table 3 in HTLM paper.
verified68.22021Paper ↗Looks wrong?
07T5-base (STSM)
T5-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified66.952024Paper ↗Looks wrong?
08BART-base (STSM)
BART-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified65.742024Paper ↗Looks wrong?
09FLAN-T5-base (STSM)
FLAN-T5-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified65.652024Paper ↗Looks wrong?

Meteor

Meteor is the reported evaluation metric for e2e. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Meteorverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01HTLM (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified46.32021Paper ↗Looks wrong?
02GPT-2-Large (prefix-tuning)
GPT-2 Large prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified46.22021Paper ↗Looks wrong?
03GPT-2-Medium (fine-tuning)
GPT-2 Medium fine-tuning on E2E NLG. From Table 3 in HTLM paper.
verified46.22021Paper ↗Looks wrong?
04HTLM (prefix-tuning)
HTLM prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified46.12021Paper ↗Looks wrong?
05GPT-2-Medium (prefix-tuning)
GPT-2 Medium prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified46.12021Paper ↗Looks wrong?
06GPT-2-Large (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified462021Paper ↗Looks wrong?
07T5-base (STSM)
T5-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified45.72024Paper ↗Looks wrong?
08BART-base (STSM)
BART-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified45.62024Paper ↗Looks wrong?
09FLAN-T5-base (STSM)
FLAN-T5-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified45.542024Paper ↗Looks wrong?

Nist

Nist is the reported evaluation metric for e2e. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Nistverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01HTLM (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified8.902021Paper ↗Looks wrong?
02HTLM (prefix-tuning)
HTLM prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified8.852021Paper ↗Looks wrong?
03GPT-2-Large (prefix-tuning)
GPT-2 Large prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified8.852021Paper ↗Looks wrong?
04GPT-2-Medium (prefix-tuning)
GPT-2 Medium prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified8.812021Paper ↗Looks wrong?
05GPT-2-Large (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified8.782021Paper ↗Looks wrong?
06GPT-2-Medium (fine-tuning)
GPT-2 Medium fine-tuning on E2E NLG. From Table 3 in HTLM paper.
verified8.622021Paper ↗Looks wrong?
07T5-base (STSM)
T5-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified8.592024Paper ↗Looks wrong?
08FLAN-T5-base (STSM)
FLAN-T5-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified8.492024Paper ↗Looks wrong?
09BART-base (STSM)
BART-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified8.462024Paper ↗Looks wrong?

Cider

Cider is the reported evaluation metric for e2e. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Ciderverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01GPT-2-Medium (prefix-tuning)
GPT-2 Medium prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified2.492021Paper ↗Looks wrong?
02HTLM (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified2.472021Paper ↗Looks wrong?
03GPT-2-Medium (fine-tuning)
GPT-2 Medium fine-tuning on E2E NLG. From Table 3 in HTLM paper.
verified2.472021Paper ↗Looks wrong?
04GPT-2-Large (prefix-tuning)
GPT-2 Large prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified2.472021Paper ↗Looks wrong?
05GPT-2-Large (fine-tuning)
From paper: HTLM: Hyper-Text Pre-Training and Prompting of Language Models
verified2.452021Paper ↗Looks wrong?
06HTLM (prefix-tuning)
HTLM prefix-tuning (0.1% params) on E2E NLG. From Table 3 in HTLM paper.
verified2.452021Paper ↗Looks wrong?
07T5-base (STSM)
T5-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified2.272024Paper ↗Looks wrong?
08BART-base (STSM)
BART-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified2.202024Paper ↗Looks wrong?
09FLAN-T5-base (STSM)
FLAN-T5-base authors run (Table 4). From: Self-training from Self-memory in Data-to-text Generation (STSM), Jan 2024.
verified2.122024Paper ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Document OCR
e2e Leaderboard | CodeSOTA | CodeSOTA