Codesota · Models · LongT50 results · 0 benchmarks
Model card

LongT5.

Text SummarizationSummarizationApache 2.0

Long document summarization. 16K+ tokens.

§ 01 · Card

Model card,
inline.

Rendered server-side from the upstream README on Hugging Face — same content as the source repo, with editorial typography. The full card, sample weights, and revision history live on HF.


Source
google/long-t5-tglobal-large
License
apache-2.0
Language
en

LongT5 (transient-global attention, large-sized model)

LongT5 model pre-trained on English language. The model was introduced in the paper LongT5: Efficient Text-To-Text Transformer for Long Sequences by Guo et al. and first released in the LongT5 repository. All the model architecture and configuration can be found in Flaxformer repository which uses another Google research project repository T5x.

Disclaimer: The team releasing LongT5 did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description

LongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting (Pegasus-like generation pre-training). LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. The usage of attention sparsity patterns allows the model to efficiently handle input sequence.

LongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) which requires handling long input sequences (up to 16,384 tokens).

Results of LongT5 (transient-global attention, large-sized model) fine-tuned on multiple (summarization, QA) tasks.

| Dataset | Rouge-1 | Rouge-2 | Rouge-Lsum | | --- | --- | --- | --- | | arXiv (16k input) | 48.28 | 21.63 | 44.11 | | PubMed (16k input) | 49.98 | 24.69 | 46.46 | | BigPatent (16k input) | 70.38 | 56.81 | 62.73 | | MultiNews (8k input) | 47.18 | 18.44 | 24.18 | | MediaSum (4k input) | 35.54 | 19.04 | 32.20 | | CNN / DailyMail (4k input) | 42.49 | 20.51 | 40.18 |

| Dataset | EM | F1 | | --- | --- | --- | | Natural Questions (4k input) | 60.77 | 65.38 | | Trivia QA (16k input) | 78.38 | 82.45 |

Intended uses & limitations

The model is mostly meant to be fine-tuned on a supervised dataset. See the model hub to look for fine-tuned versions on a task that interests you.

How to use

python
from transformers import AutoTokenizer, LongT5Model tokenizer = AutoTokenizer.from_pretrained("google/long-t5-tglobal-large") model = LongT5Model.from_pretrained("google/long-t5-tglobal-large") inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") outputs = model(**inputs) last_hidden_states = outputs.last_hidden_state

BibTeX entry and citation info

bibtex
@article{guo2021longt5, title={LongT5: Efficient Text-To-Text Transformer for Long Sequences}, author={Guo, Mandy and Ainslie, Joshua and Uthus, David and Ontanon, Santiago and Ni, Jianmo and Sung, Yun-Hsuan and Yang, Yinfei}, journal={arXiv preprint arXiv:2112.07916}, year={2021} }
Card content reproduced from huggingface.co/google/long-t5-tglobal-large under the upstream license. Rendering trims fenced HTML, raw widgets and tables for safety; tap the link for the untouched original.
§ 02 · Benchmarks

No recorded benchmark results yet.

This model is in the registry but doesn’t have any benchmark_results rows yet. If you have a score, submit it →

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.