andreaparker
/

long-summ

text2text-generation

Model card Files Files and versions

andreaparker commited on Feb 10, 2023

Commit

be6c444

·

1 Parent(s): 84b5548

Update README.md

Files changed (1) hide show

README.md +21 -4

README.md CHANGED Viewed

@@ -403,14 +403,31 @@ result = summarizer(
 ```
-**Important:** To generate the best quality summaries, you should use the global attention mask when decoding, as demonstrated in [this community notebook here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing), see the definition of `generate_answer(batch)`.
-If having computing constraints, try the base version [`pszemraj/led-base-book-summary`](https://huggingface.co/pszemraj/led-base-book-summary)
-- all the parameters for generation on the API here are the same as [the base model](https://huggingface.co/pszemraj/led-base-book-summary) for easy comparison between versions.
 ## Training and evaluation data
-- the [booksum](https://arxiv.org/abs/2105.08209) dataset (this is what adds the `bsd-3-clause` license)
 - During training, the input text was the text of the `chapter`, and the output was `summary_text`
 - Eval results can be found [here](https://huggingface.co/datasets/autoevaluate/autoeval-staging-eval-project-kmfoda__booksum-79c1c0d8-10905463) with metrics on the sidebar.

 ```
+**Note:** The global attention mask needs to be used when decoding to generate the best-quality summaries:
+- ## 🤗 20-lines of code to reproduce SOTA on Arxiv with **Longformer Encoder-Decoder (LED)** 🤗
+- https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing)
+- See the `generate_batch` function for more details; note the beam search as well; we've pasted the function below for easy reference
+```python
+import torch
+def generate_answer(batch):
+  inputs_dict = tokenizer(batch["article"], padding="max_length", max_length=16384, return_tensors="pt", truncation=True)
+  input_ids = inputs_dict.input_ids.to("cuda")
+  attention_mask = inputs_dict.attention_mask.to("cuda")
+  global_attention_mask = torch.zeros_like(attention_mask)
+  # put global attention on <s> token
+  global_attention_mask[:, 0] = 1
+  predicted_abstract_ids = model.generate(input_ids, attention_mask=attention_mask, global_attention_mask=global_attention_mask, max_length=512, num_beams=4)
+  batch["predicted_abstract"] = tokenizer.batch_decode(predicted_abstract_ids, skip_special_tokens=True)
+  return batch
+```
 ## Training and evaluation data
+- You'll want to train on the [BookSum](https://arxiv.org/abs/2105.08209) dataset
 - During training, the input text was the text of the `chapter`, and the output was `summary_text`
 - Eval results can be found [here](https://huggingface.co/datasets/autoevaluate/autoeval-staging-eval-project-kmfoda__booksum-79c1c0d8-10905463) with metrics on the sidebar.