kshitijthakkar commited on
Commit
d6228bf
ยท
verified ยท
1 Parent(s): a429c3a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -107,6 +107,7 @@ with torch.no_grad():
107
  ---
108
  ๐Ÿ”ง Expert Routing
109
  ---
 
110
  This model uses a top-2 gating mechanism where, for each token, two of the eight experts are selected based on learned router logits.
111
 
112
  During training, a light auxiliary loss was applied to encourage balanced expert usage and improve routing stability.
@@ -116,12 +117,15 @@ Note: Routing logits are optionally available in the model outputs via output_ro
116
  ---
117
  ๐Ÿ“ƒ License
118
  ---
 
119
  This model is released under the Apache 2.0 License.
 
120
  ---
121
  ๐Ÿ™Œ Acknowledgements
122
  ---
123
  Trained using:
124
  ---
 
125
  ๐Ÿงจ Hugging Face Transformers
126
 
127
  ๐Ÿง  Custom training loop with gradient checkpointing
@@ -129,9 +133,10 @@ Trained using:
129
  ๐Ÿงฎ NVIDIA RTX 4090 (24GB VRAM) / A100 (40GB)
130
 
131
  ๐Ÿ“ฆ Logged and tracked via Weights & Biases
 
132
  ---
133
- ### ๐Ÿ—ฃ๏ธ Citation
134
 
 
135
  ---
136
  @misc{loggenix-moe-0.12B-A0.08B-e5-lr5e4-b4-3060,
137
  title = {loggenix-moe-0.12B-A0.08B-e5-lr5e4-b4-3060: A Lightweight Mixture-of-Experts Model},
 
107
  ---
108
  ๐Ÿ”ง Expert Routing
109
  ---
110
+
111
  This model uses a top-2 gating mechanism where, for each token, two of the eight experts are selected based on learned router logits.
112
 
113
  During training, a light auxiliary loss was applied to encourage balanced expert usage and improve routing stability.
 
117
  ---
118
  ๐Ÿ“ƒ License
119
  ---
120
+
121
  This model is released under the Apache 2.0 License.
122
+
123
  ---
124
  ๐Ÿ™Œ Acknowledgements
125
  ---
126
  Trained using:
127
  ---
128
+
129
  ๐Ÿงจ Hugging Face Transformers
130
 
131
  ๐Ÿง  Custom training loop with gradient checkpointing
 
133
  ๐Ÿงฎ NVIDIA RTX 4090 (24GB VRAM) / A100 (40GB)
134
 
135
  ๐Ÿ“ฆ Logged and tracked via Weights & Biases
136
+
137
  ---
 
138
 
139
+ ### ๐Ÿ—ฃ๏ธ Citation
140
  ---
141
  @misc{loggenix-moe-0.12B-A0.08B-e5-lr5e4-b4-3060,
142
  title = {loggenix-moe-0.12B-A0.08B-e5-lr5e4-b4-3060: A Lightweight Mixture-of-Experts Model},