Text Generation
Transformers
Safetensors
English
phimoe
conversational
custom_code
cliang1453 commited on
Commit
bd3840a
·
verified ·
1 Parent(s): 509c1ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -1,3 +1,12 @@
 
 
 
 
 
 
 
 
 
1
  ## Model Summary
2
 
3
  The Phi-mini-MoE is a 7.6B total parameters with 2.4B activated parameters, lightweight, state-of-the-art open Mixture of Expert (MoE) model compressed and distilled from Phi-3.5-MoE using [SlimMoE](http:\\link.to.slimmoe). The training process utilizes Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. The model belongs to the SlimMoE series, where a smaller version model Phi-tiny-MoE with 3.8B total parameters and 1.1B activated parameters is available.
@@ -161,4 +170,4 @@ Note that by default, the Phi-mini-MoE model uses flash attention, which require
161
  The model is licensed under the [MIT license](./LICENSE).
162
 
163
  ## Trademarks
164
- This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - microsoft/GRIN-MoE
7
+ - microsoft/Phi-3.5-MoE-instruct
8
+ pipeline_tag: text-generation
9
+ ---
10
  ## Model Summary
11
 
12
  The Phi-mini-MoE is a 7.6B total parameters with 2.4B activated parameters, lightweight, state-of-the-art open Mixture of Expert (MoE) model compressed and distilled from Phi-3.5-MoE using [SlimMoE](http:\\link.to.slimmoe). The training process utilizes Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. The model belongs to the SlimMoE series, where a smaller version model Phi-tiny-MoE with 3.8B total parameters and 1.1B activated parameters is available.
 
170
  The model is licensed under the [MIT license](./LICENSE).
171
 
172
  ## Trademarks
173
+ This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.