Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
## Model Summary
|
| 2 |
|
| 3 |
The Phi-mini-MoE is a 7.6B total parameters with 2.4B activated parameters, lightweight, state-of-the-art open Mixture of Expert (MoE) model compressed and distilled from Phi-3.5-MoE using [SlimMoE](http:\\link.to.slimmoe). The training process utilizes Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. The model belongs to the SlimMoE series, where a smaller version model Phi-tiny-MoE with 3.8B total parameters and 1.1B activated parameters is available.
|
|
@@ -161,4 +170,4 @@ Note that by default, the Phi-mini-MoE model uses flash attention, which require
|
|
| 161 |
The model is licensed under the [MIT license](./LICENSE).
|
| 162 |
|
| 163 |
## Trademarks
|
| 164 |
-
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- microsoft/GRIN-MoE
|
| 7 |
+
- microsoft/Phi-3.5-MoE-instruct
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
+
---
|
| 10 |
## Model Summary
|
| 11 |
|
| 12 |
The Phi-mini-MoE is a 7.6B total parameters with 2.4B activated parameters, lightweight, state-of-the-art open Mixture of Expert (MoE) model compressed and distilled from Phi-3.5-MoE using [SlimMoE](http:\\link.to.slimmoe). The training process utilizes Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. The model belongs to the SlimMoE series, where a smaller version model Phi-tiny-MoE with 3.8B total parameters and 1.1B activated parameters is available.
|
|
|
|
| 170 |
The model is licensed under the [MIT license](./LICENSE).
|
| 171 |
|
| 172 |
## Trademarks
|
| 173 |
+
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
|