andreaparker commited on
Commit
be6c444
·
1 Parent(s): 84b5548

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -4
README.md CHANGED
@@ -403,14 +403,31 @@ result = summarizer(
403
  ```
404
 
405
 
406
- **Important:** To generate the best quality summaries, you should use the global attention mask when decoding, as demonstrated in [this community notebook here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing), see the definition of `generate_answer(batch)`.
 
 
 
407
 
408
- If having computing constraints, try the base version [`pszemraj/led-base-book-summary`](https://huggingface.co/pszemraj/led-base-book-summary)
409
- - all the parameters for generation on the API here are the same as [the base model](https://huggingface.co/pszemraj/led-base-book-summary) for easy comparison between versions.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
410
 
411
  ## Training and evaluation data
412
 
413
- - the [booksum](https://arxiv.org/abs/2105.08209) dataset (this is what adds the `bsd-3-clause` license)
414
  - During training, the input text was the text of the `chapter`, and the output was `summary_text`
415
  - Eval results can be found [here](https://huggingface.co/datasets/autoevaluate/autoeval-staging-eval-project-kmfoda__booksum-79c1c0d8-10905463) with metrics on the sidebar.
416
 
 
403
  ```
404
 
405
 
406
+ **Note:** The global attention mask needs to be used when decoding to generate the best-quality summaries:
407
+ - ## 🤗 20-lines of code to reproduce SOTA on Arxiv with **Longformer Encoder-Decoder (LED)** 🤗
408
+ - https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing)
409
+ - See the `generate_batch` function for more details; note the beam search as well; we've pasted the function below for easy reference
410
 
411
+ ```python
412
+ import torch
413
+
414
+ def generate_answer(batch):
415
+ inputs_dict = tokenizer(batch["article"], padding="max_length", max_length=16384, return_tensors="pt", truncation=True)
416
+ input_ids = inputs_dict.input_ids.to("cuda")
417
+ attention_mask = inputs_dict.attention_mask.to("cuda")
418
+
419
+ global_attention_mask = torch.zeros_like(attention_mask)
420
+ # put global attention on <s> token
421
+ global_attention_mask[:, 0] = 1
422
+
423
+ predicted_abstract_ids = model.generate(input_ids, attention_mask=attention_mask, global_attention_mask=global_attention_mask, max_length=512, num_beams=4)
424
+ batch["predicted_abstract"] = tokenizer.batch_decode(predicted_abstract_ids, skip_special_tokens=True)
425
+ return batch
426
+ ```
427
 
428
  ## Training and evaluation data
429
 
430
+ - You'll want to train on the [BookSum](https://arxiv.org/abs/2105.08209) dataset
431
  - During training, the input text was the text of the `chapter`, and the output was `summary_text`
432
  - Eval results can be found [here](https://huggingface.co/datasets/autoevaluate/autoeval-staging-eval-project-kmfoda__booksum-79c1c0d8-10905463) with metrics on the sidebar.
433