Update README.md
Browse files
README.md
CHANGED
|
@@ -131,32 +131,6 @@ print(tokenizer.decode(outputs[0]))
|
|
| 131 |
|
| 132 |
**Important Note:** Models based on Gemma 2 such as BgGPT-Gemma-2-2.6B-IT-v1.0 do not support flash attention. Using it results in degraded performance.
|
| 133 |
|
| 134 |
-
```python
|
| 135 |
-
tokenizer = AutoTokenizer.from_pretrained(
|
| 136 |
-
"INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0",
|
| 137 |
-
use_default_system_prompt=False,
|
| 138 |
-
)
|
| 139 |
-
|
| 140 |
-
messages = [
|
| 141 |
-
{"role": "user", "content": "Кога е основан Софийският университет?"},
|
| 142 |
-
]
|
| 143 |
-
|
| 144 |
-
input_ids = tokenizer.apply_chat_template(
|
| 145 |
-
messages,
|
| 146 |
-
return_tensors="pt",
|
| 147 |
-
add_generation_prompt=True,
|
| 148 |
-
return_dict=True
|
| 149 |
-
)
|
| 150 |
-
|
| 151 |
-
outputs = model.generate(
|
| 152 |
-
**input_ids,
|
| 153 |
-
generation_config=generation_params
|
| 154 |
-
)
|
| 155 |
-
print(tokenizer.decode(outputs[0]))
|
| 156 |
-
```
|
| 157 |
-
|
| 158 |
-
**Important Note:** Models based on Gemma 2 such as BgGPT-Gemma-2-2.6B-IT-v1.0 do not support flash attention. Using it results in degraded performance.
|
| 159 |
-
|
| 160 |
# Use with vLLM
|
| 161 |
|
| 162 |
Example usage with vLLM:
|
|
|
|
| 131 |
|
| 132 |
**Important Note:** Models based on Gemma 2 such as BgGPT-Gemma-2-2.6B-IT-v1.0 do not support flash attention. Using it results in degraded performance.
|
| 133 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
# Use with vLLM
|
| 135 |
|
| 136 |
Example usage with vLLM:
|