e2e.

Dataset from Papers With Code

Saturated benchmark

Benchmark abandoned or no longer evaluated by the community

Submit a result ↵

§ 01 · Leaderboard

Best published scores.

45 results indexed across 5 metrics. Shaded row marks current SOTA; ties broken by submission date.

Primary: accuracy · higher is better
All metrics: bleu, cider, meteor, nist, rouge-l

bleu

9 rows

#	Model	Org	Submitted	Paper / code	bleu
01	GPT-2-Large (prefix-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	70.30
02	HTLM (fine-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	70.30
03	HTLM (prefix-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	70.10
04	GPT-2-Medium (prefix-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	69.70
05	GPT-2-Large (fine-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	68.50
06	GPT-2-Medium (fine-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	68.20
07	T5-base (STSM)	Google	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	66.95
08	BART-base (STSM)	Meta	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	65.74
09	FLAN-T5-base (STSM)	Google	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	65.65

cider

9 rows

#	Model	Org	Submitted	Paper / code	cider
01	GPT-2-Medium (prefix-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	2.49
02	GPT-2-Large (prefix-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	2.47
03	GPT-2-Medium (fine-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	2.47
04	HTLM (fine-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	2.47
05	GPT-2-Large (fine-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	2.45
06	HTLM (prefix-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	2.45
07	T5-base (STSM)	Google	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	2.27
08	BART-base (STSM)	Meta	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	2.20
09	FLAN-T5-base (STSM)	Google	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	2.12

meteor

9 rows

#	Model	Org	Submitted	Paper / code	meteor
01	HTLM (fine-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	46.30
02	GPT-2-Large (prefix-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	46.20
03	GPT-2-Medium (fine-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	46.20
04	HTLM (prefix-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	46.10
05	GPT-2-Medium (prefix-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	46.10
06	GPT-2-Large (fine-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	46
07	T5-base (STSM)	Google	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	45.70
08	BART-base (STSM)	Meta	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	45.60
09	FLAN-T5-base (STSM)	Google	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	45.54

nist

9 rows

#	Model	Org	Submitted	Paper / code	nist
01	HTLM (fine-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	8.90
02	HTLM (prefix-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	8.85
03	GPT-2-Large (prefix-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	8.85
04	GPT-2-Medium (prefix-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	8.81
05	GPT-2-Large (fine-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	8.78
06	GPT-2-Medium (fine-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	8.62
07	T5-base (STSM)	Google	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	8.59
08	FLAN-T5-base (STSM)	Google	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	8.49
09	BART-base (STSM)	Meta	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	8.46

rouge-l

9 rows

#	Model	Org	Submitted	Paper / code	rouge-l
01	GPT-2-Large (prefix-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	71.70
02	GPT-2-Medium (prefix-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	71.40
03	HTLM (prefix-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	71.20
04	GPT-2-Medium (fine-tuning)	OpenAI	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	71
05	HTLM (fine-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	70.80
06	GPT-2-Large (fine-tuning)	—	Jul 2021	HTLM: Hyper-Text Pre-Training and Prompting of Language …	69.90
07	T5-base (STSM)	Google	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	68.97
08	BART-base (STSM)	Meta	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	68.76
09	FLAN-T5-base (STSM)	Google	Jan 2024	Self-training from Self-memory in Data-to-text Generatio…	67.85

Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.

§ 04 · Literature

2 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

Self-training from Self-memory in Data-to-text Generation
Hoang-Thang TaAbu Bakar Siddiqur RahmanAkira Utsumi
arXivJan 2024·T5-base (STSM), BART-base (STSM), FLAN-T5-base (STSM)
arXiv ↗
HTLM: Hyper-Text Pre-Training and Prompting of Language Models
Jul 2021·GPT-2-Large (prefix-tuning), HTLM (fine-tuning), HTLM (prefix-tuning) +3
arXiv ↗

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result ↵Read submission guide

What a submission needs

01A public checkpoint or API endpoint
02A reproduction script with frozen commit + seed
03Declared evaluation environment (Python, deps)
04One row per metric declared by this dataset
05A contact so we can follow up on discrepancies

e2e.

Best published scores.

2 paperstied to this benchmark.

Neighbouring benchmarks.

Have a score that beatsthis table?

2 papers
tied to this benchmark.

Have a score that beats
this table?