Upload complete model
Browse files
README.md
CHANGED
|
@@ -9,8 +9,6 @@ base_model: zai-org/GLM-4.6
|
|
| 9 |
tags:
|
| 10 |
- mlx
|
| 11 |
---
|
| 12 |
-
** CURRENTLY UPLOADING **
|
| 13 |
-
|
| 14 |
**See GLM-4.6 6.5bit MLX in action - [demonstration video - coming soon](https://www.youtube.com/xcreate)**
|
| 15 |
|
| 16 |
*q6.5bit quant typically achieves the highest perplexity in our testing*
|
|
@@ -26,7 +24,7 @@ tags:
|
|
| 26 |
## Usage Notes
|
| 27 |
|
| 28 |
* Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com)
|
| 29 |
-
* Memory usage: ~
|
| 30 |
* Expect ~16 tokens/s
|
| 31 |
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.27
|
| 32 |
* For more details see [demonstration video - coming soon](https://www.youtube.com/xcreate) or visit [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6).
|
|
|
|
| 9 |
tags:
|
| 10 |
- mlx
|
| 11 |
---
|
|
|
|
|
|
|
| 12 |
**See GLM-4.6 6.5bit MLX in action - [demonstration video - coming soon](https://www.youtube.com/xcreate)**
|
| 13 |
|
| 14 |
*q6.5bit quant typically achieves the highest perplexity in our testing*
|
|
|
|
| 24 |
## Usage Notes
|
| 25 |
|
| 26 |
* Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com)
|
| 27 |
+
* Memory usage: ~270 GB
|
| 28 |
* Expect ~16 tokens/s
|
| 29 |
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.27
|
| 30 |
* For more details see [demonstration video - coming soon](https://www.youtube.com/xcreate) or visit [GLM-4.6](https://huggingface.co/zai-org/GLM-4.6).
|