Update README.md
Browse files
README.md
CHANGED
|
@@ -2,15 +2,14 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
| 5 |
-
# JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars
|
| 6 |
-
|
| 7 |
-
|
| 8 |
<div align="center">
|
| 9 |
<div> </div>
|
| 10 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/641de0213239b631552713e4/ieHnwuczidNNoGRA_FN2y.png" width="500"/>
|
| 11 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/641de0213239b631552713e4/UOsk9_zcbHpCCy6kmryYM.png" width="530"/>
|
| 12 |
</div>
|
| 13 |
|
|
|
|
|
|
|
| 14 |
## Key Messages
|
| 15 |
|
| 16 |
1. JetMoE-8B is **trained with less than $ 0.1 million**<sup>1</sup> **cost but outperforms LLaMA2-7B from Meta AI**, who has multi-billion-dollar training resources. LLM training can be **much cheaper than people generally thought**.
|
|
@@ -63,7 +62,7 @@ We use the same evaluation methodology as in the Open LLM leaderboard. For MBPP
|
|
| 63 |
To our surprise, despite the lower training cost and computation, JetMoE-8B performs even better than LLaMA2-7B, LLaMA-13B, and DeepseekMoE-16B. Compared to a model with similar training and inference computation, like Gemma-2B, JetMoE-8B achieves better performance.
|
| 64 |
|
| 65 |
## Model Usage
|
| 66 |
-
To load the models, you need install this package:
|
| 67 |
```
|
| 68 |
pip install -e .
|
| 69 |
```
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
|
|
|
|
|
|
|
|
|
| 5 |
<div align="center">
|
| 6 |
<div> </div>
|
| 7 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/641de0213239b631552713e4/ieHnwuczidNNoGRA_FN2y.png" width="500"/>
|
| 8 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/641de0213239b631552713e4/UOsk9_zcbHpCCy6kmryYM.png" width="530"/>
|
| 9 |
</div>
|
| 10 |
|
| 11 |
+
# JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars
|
| 12 |
+
|
| 13 |
## Key Messages
|
| 14 |
|
| 15 |
1. JetMoE-8B is **trained with less than $ 0.1 million**<sup>1</sup> **cost but outperforms LLaMA2-7B from Meta AI**, who has multi-billion-dollar training resources. LLM training can be **much cheaper than people generally thought**.
|
|
|
|
| 62 |
To our surprise, despite the lower training cost and computation, JetMoE-8B performs even better than LLaMA2-7B, LLaMA-13B, and DeepseekMoE-16B. Compared to a model with similar training and inference computation, like Gemma-2B, JetMoE-8B achieves better performance.
|
| 63 |
|
| 64 |
## Model Usage
|
| 65 |
+
To load the models, you need install [this package](https://github.com/myshell-ai/JetMoE):
|
| 66 |
```
|
| 67 |
pip install -e .
|
| 68 |
```
|