Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -308,3 +308,27 @@ done
|
|
| 308 |
**Format:** GGUF | **Runtime:** Ollama / llama.cpp | **Created:** October 2025
|
| 309 |
|
| 310 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 308 |
**Format:** GGUF | **Runtime:** Ollama / llama.cpp | **Created:** October 2025
|
| 309 |
|
| 310 |
</div>
|
| 311 |
+
|
| 312 |
+
|
| 313 |
+
## Hardware Requirements
|
| 314 |
+
|
| 315 |
+
KAT-Dev 72B is a large coding model. Choose your quantization based on available VRAM/RAM:
|
| 316 |
+
|
| 317 |
+
| Quantization | Model Size | VRAM Required | Quality |
|
| 318 |
+
|:------------:|:----------:|:-------------:|:--------|
|
| 319 |
+
| **Q2_K** | ~27 GB | 32 GB | Acceptable |
|
| 320 |
+
| **Q3_K_M** | ~34 GB | 40 GB | Good |
|
| 321 |
+
| **Q4_K_M** | ~42 GB | 48 GB | Very Good - recommended |
|
| 322 |
+
| **Q5_K_M** | ~50 GB | 56 GB | Excellent |
|
| 323 |
+
| **Q6_K** | ~58 GB | 64 GB | Near original |
|
| 324 |
+
| **Q8_0** | ~77 GB | 80 GB | Original quality |
|
| 325 |
+
|
| 326 |
+
### Recommended Setups
|
| 327 |
+
|
| 328 |
+
| Hardware | Recommended Quantization |
|
| 329 |
+
|:---------|:-------------------------|
|
| 330 |
+
| RTX 4090 (24GB) | Q2_K with offloading |
|
| 331 |
+
| 2x RTX 4090 (48GB) | Q4_K_M |
|
| 332 |
+
| A100 (80GB) | Q8_0 |
|
| 333 |
+
| Mac Studio M2 Ultra (192GB) | Q8_0 via llama.cpp |
|
| 334 |
+
|