microsoft
/

Phi-mini-MoE-instruct

Text Generation

Model card Files Files and versions

cliang1453 commited on Jun 23

Commit

bd3840a

·

verified ·

1 Parent(s): 509c1ad

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -1,3 +1,12 @@
 ## Model Summary
 The Phi-mini-MoE is a 7.6B total parameters with 2.4B activated parameters, lightweight, state-of-the-art open Mixture of Expert (MoE) model compressed and distilled from Phi-3.5-MoE using [SlimMoE](http:\\link.to.slimmoe). The training process utilizes Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. The model belongs to the SlimMoE series, where a smaller version model Phi-tiny-MoE with 3.8B total parameters and 1.1B activated parameters is available.
@@ -161,4 +170,4 @@ Note that by default, the Phi-mini-MoE model uses flash attention, which require
 The model is licensed under the [MIT license](./LICENSE).
 ## Trademarks
-This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

+---
+license: mit
+language:
+- en
+base_model:
+- microsoft/GRIN-MoE
+- microsoft/Phi-3.5-MoE-instruct
+pipeline_tag: text-generation
+---
 ## Model Summary
 The Phi-mini-MoE is a 7.6B total parameters with 2.4B activated parameters, lightweight, state-of-the-art open Mixture of Expert (MoE) model compressed and distilled from Phi-3.5-MoE using [SlimMoE](http:\\link.to.slimmoe). The training process utilizes Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. The model belongs to the SlimMoE series, where a smaller version model Phi-tiny-MoE with 3.8B total parameters and 1.1B activated parameters is available.
 The model is licensed under the [MIT license](./LICENSE).
 ## Trademarks
+This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.