Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -127,7 +127,7 @@ print(output[0]['generated_text']) | |
| 127 | 
             
            Note that by default the model use flash attention which requires certain types of GPU to run. If you want to run the model on:
         | 
| 128 |  | 
| 129 | 
             
            + V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()`  with `attn_implementation="eager"`
         | 
| 130 | 
            -
            + Optimized inference: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
         | 
| 131 |  | 
| 132 | 
             
            ## Responsible AI Considerations
         | 
| 133 |  | 
|  | |
| 127 | 
             
            Note that by default the model use flash attention which requires certain types of GPU to run. If you want to run the model on:
         | 
| 128 |  | 
| 129 | 
             
            + V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()`  with `attn_implementation="eager"`
         | 
| 130 | 
            +
            + Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
         | 
| 131 |  | 
| 132 | 
             
            ## Responsible AI Considerations
         | 
| 133 |  | 

