Commit 
							
							·
						
						bc2764f
	
1
								Parent(s):
							
							ff46155
								
fix typos (#6)
Browse files- fix typos (c2a5e573587885ce23744cf330ee7c402f0df16f)
Co-authored-by: George Ogden <[email protected]>
    	
        README.md
    CHANGED
    
    | @@ -42,7 +42,7 @@ interests you. | |
| 42 |  | 
| 43 | 
             
            Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
         | 
| 44 | 
             
            to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
         | 
| 45 | 
            -
            generation you should look at model like GPT2.
         | 
| 46 |  | 
| 47 | 
             
            ### How to use
         | 
| 48 |  | 
| @@ -166,14 +166,14 @@ The RoBERTa model was pretrained on the reunion of five datasets: | |
| 166 | 
             
            - [Stories](https://arxiv.org/abs/1806.02847) a dataset containing a subset of CommonCrawl data filtered to match the
         | 
| 167 | 
             
              story-like style of Winograd schemas.
         | 
| 168 |  | 
| 169 | 
            -
            Together  | 
| 170 |  | 
| 171 | 
             
            ## Training procedure
         | 
| 172 |  | 
| 173 | 
             
            ### Preprocessing
         | 
| 174 |  | 
| 175 | 
             
            The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
         | 
| 176 | 
            -
            the model take pieces of 512 contiguous  | 
| 177 | 
             
            with `<s>` and the end of one by `</s>`
         | 
| 178 |  | 
| 179 | 
             
            The details of the masking procedure for each sentence are the following:
         | 
|  | |
| 42 |  | 
| 43 | 
             
            Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
         | 
| 44 | 
             
            to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
         | 
| 45 | 
            +
            generation you should look at a model like GPT2.
         | 
| 46 |  | 
| 47 | 
             
            ### How to use
         | 
| 48 |  | 
|  | |
| 166 | 
             
            - [Stories](https://arxiv.org/abs/1806.02847) a dataset containing a subset of CommonCrawl data filtered to match the
         | 
| 167 | 
             
              story-like style of Winograd schemas.
         | 
| 168 |  | 
| 169 | 
            +
            Together these datasets weigh 160GB of text.
         | 
| 170 |  | 
| 171 | 
             
            ## Training procedure
         | 
| 172 |  | 
| 173 | 
             
            ### Preprocessing
         | 
| 174 |  | 
| 175 | 
             
            The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
         | 
| 176 | 
            +
            the model take pieces of 512 contiguous tokens that may span over documents. The beginning of a new document is marked
         | 
| 177 | 
             
            with `<s>` and the end of one by `</s>`
         | 
| 178 |  | 
| 179 | 
             
            The details of the masking procedure for each sentence are the following:
         | 

