Update README.md
Browse files
README.md
CHANGED
|
@@ -20,6 +20,8 @@ tags:
|
|
| 20 |
## Model Overview
|
| 21 |
Huihui-MoE-23B-A4B is a **Mixture of Experts (MoE)** language model developed by **huihui.ai**, built upon the **[Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)** base model. It enhances the standard Transformer architecture by replacing MLP layers with MoE layers, each containing 8 experts, to achieve high performance with efficient inference. The model is designed for natural language processing tasks, including text generation, question answering, and conversational applications.
|
| 22 |
|
|
|
|
|
|
|
| 23 |
**Note**:
|
| 24 |
The activated expert can handle numbers from 1 to 8, and can complete normal conversations as well.
|
| 25 |
You can change the activation parameters using `/num_experts_per_tok <number>`. After modifying the parameters, the model will be reloaded.
|
|
|
|
| 20 |
## Model Overview
|
| 21 |
Huihui-MoE-23B-A4B is a **Mixture of Experts (MoE)** language model developed by **huihui.ai**, built upon the **[Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)** base model. It enhances the standard Transformer architecture by replacing MLP layers with MoE layers, each containing 8 experts, to achieve high performance with efficient inference. The model is designed for natural language processing tasks, including text generation, question answering, and conversational applications.
|
| 22 |
|
| 23 |
+
The corresponding ablation version is [huihui-ai/Huihui-MoE-23B-A4B-abliterated](https://huggingface.co/huihui-ai/Huihui-MoE-23B-A4B-abliterated)
|
| 24 |
+
|
| 25 |
**Note**:
|
| 26 |
The activated expert can handle numbers from 1 to 8, and can complete normal conversations as well.
|
| 27 |
You can change the activation parameters using `/num_experts_per_tok <number>`. After modifying the parameters, the model will be reloaded.
|