Update README.md
Browse files
README.md
CHANGED
|
@@ -3,3 +3,13 @@ license: other
|
|
| 3 |
license_name: llama3
|
| 4 |
license_link: LICENSE
|
| 5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
license_name: llama3
|
| 4 |
license_link: LICENSE
|
| 5 |
---
|
| 6 |
+
|
| 7 |
+
The original Llama 3 8b (base) special token weights are zero, which might cause NaN gradients. This version re-initialized the weights of all the following special tokens to alleviate the problem.
|
| 8 |
+
|
| 9 |
+
```
|
| 10 |
+
<|eot_id|>
|
| 11 |
+
<|start_header_id|>
|
| 12 |
+
<|end_header_id|>
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
+
We set the weights of these tokens in `embed` and `lm_head` to be the mean of all other tokens.
|