Update README.md
Browse files
README.md
CHANGED
|
@@ -11,9 +11,9 @@ tags:
|
|
| 11 |
# 🔥 Quantized Model: Mistral-Small-24B-Instruct-2501_gptq_g32_4bit 🔥
|
| 12 |
|
| 13 |
This is a 4-bit quantized version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
|
| 14 |
-
It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of
|
| 15 |
smaller,
|
| 16 |
-
faster model with minimal performance degradation.
|
| 17 |
|
| 18 |
Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
|
| 19 |
|
|
|
|
| 11 |
# 🔥 Quantized Model: Mistral-Small-24B-Instruct-2501_gptq_g32_4bit 🔥
|
| 12 |
|
| 13 |
This is a 4-bit quantized version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) model, quantized by [ConfidentialMind.com](https://www.confidentialmind.com) 🤖✨
|
| 14 |
+
It leverages the open-source GPTQModel quantization to achieve 4-bit precision with a group size of 32 resulting in a
|
| 15 |
smaller,
|
| 16 |
+
faster model with minimal performance degradation. The G128 variant used MSE loss in order to avoid performance degredation.
|
| 17 |
|
| 18 |
Ran on a single NVIDIA A100 GPU with 80GB of VRAM.
|
| 19 |
|