Update README.md
Browse files
README.md
CHANGED
|
@@ -107,6 +107,7 @@ with torch.no_grad():
|
|
| 107 |
---
|
| 108 |
๐ง Expert Routing
|
| 109 |
---
|
|
|
|
| 110 |
This model uses a top-2 gating mechanism where, for each token, two of the eight experts are selected based on learned router logits.
|
| 111 |
|
| 112 |
During training, a light auxiliary loss was applied to encourage balanced expert usage and improve routing stability.
|
|
@@ -116,12 +117,15 @@ Note: Routing logits are optionally available in the model outputs via output_ro
|
|
| 116 |
---
|
| 117 |
๐ License
|
| 118 |
---
|
|
|
|
| 119 |
This model is released under the Apache 2.0 License.
|
|
|
|
| 120 |
---
|
| 121 |
๐ Acknowledgements
|
| 122 |
---
|
| 123 |
Trained using:
|
| 124 |
---
|
|
|
|
| 125 |
๐งจ Hugging Face Transformers
|
| 126 |
|
| 127 |
๐ง Custom training loop with gradient checkpointing
|
|
@@ -129,9 +133,10 @@ Trained using:
|
|
| 129 |
๐งฎ NVIDIA RTX 4090 (24GB VRAM) / A100 (40GB)
|
| 130 |
|
| 131 |
๐ฆ Logged and tracked via Weights & Biases
|
|
|
|
| 132 |
---
|
| 133 |
-
### ๐ฃ๏ธ Citation
|
| 134 |
|
|
|
|
| 135 |
---
|
| 136 |
@misc{loggenix-moe-0.12B-A0.08B-e5-lr5e4-b4-3060,
|
| 137 |
title = {loggenix-moe-0.12B-A0.08B-e5-lr5e4-b4-3060: A Lightweight Mixture-of-Experts Model},
|
|
|
|
| 107 |
---
|
| 108 |
๐ง Expert Routing
|
| 109 |
---
|
| 110 |
+
|
| 111 |
This model uses a top-2 gating mechanism where, for each token, two of the eight experts are selected based on learned router logits.
|
| 112 |
|
| 113 |
During training, a light auxiliary loss was applied to encourage balanced expert usage and improve routing stability.
|
|
|
|
| 117 |
---
|
| 118 |
๐ License
|
| 119 |
---
|
| 120 |
+
|
| 121 |
This model is released under the Apache 2.0 License.
|
| 122 |
+
|
| 123 |
---
|
| 124 |
๐ Acknowledgements
|
| 125 |
---
|
| 126 |
Trained using:
|
| 127 |
---
|
| 128 |
+
|
| 129 |
๐งจ Hugging Face Transformers
|
| 130 |
|
| 131 |
๐ง Custom training loop with gradient checkpointing
|
|
|
|
| 133 |
๐งฎ NVIDIA RTX 4090 (24GB VRAM) / A100 (40GB)
|
| 134 |
|
| 135 |
๐ฆ Logged and tracked via Weights & Biases
|
| 136 |
+
|
| 137 |
---
|
|
|
|
| 138 |
|
| 139 |
+
### ๐ฃ๏ธ Citation
|
| 140 |
---
|
| 141 |
@misc{loggenix-moe-0.12B-A0.08B-e5-lr5e4-b4-3060,
|
| 142 |
title = {loggenix-moe-0.12B-A0.08B-e5-lr5e4-b4-3060: A Lightweight Mixture-of-Experts Model},
|