FacebookAI
/

roberta-base

George-Ogden commited on Mar 6, 2023

Commit

bc2764f

1 Parent(s): ff46155

fix typos (#6)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -42,7 +42,7 @@ interests you.
 Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
 to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
 ### How to use
@@ -166,14 +166,14 @@ The RoBERTa model was pretrained on the reunion of five datasets:
 - [Stories](https://arxiv.org/abs/1806.02847) a dataset containing a subset of CommonCrawl data filtered to match the
   story-like style of Winograd schemas.
-Together theses datasets weight 160GB of text.
 ## Training procedure
 ### Preprocessing
 The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
-the model take pieces of 512 contiguous token that may span over documents. The beginning of a new document is marked
 with `<s>` and the end of one by `</s>`
 The details of the masking procedure for each sentence are the following:

 Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
 to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
+generation you should look at a model like GPT2.
 ### How to use
 - [Stories](https://arxiv.org/abs/1806.02847) a dataset containing a subset of CommonCrawl data filtered to match the
   story-like style of Winograd schemas.
+Together these datasets weigh 160GB of text.
 ## Training procedure
 ### Preprocessing
 The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
+the model take pieces of 512 contiguous tokens that may span over documents. The beginning of a new document is marked
 with `<s>` and the end of one by `</s>`
 The details of the masking procedure for each sentence are the following: