richardyoung commited on
Commit
5f76079
·
verified ·
1 Parent(s): eab066b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -308,3 +308,27 @@ done
308
  **Format:** GGUF | **Runtime:** Ollama / llama.cpp | **Created:** October 2025
309
 
310
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
308
  **Format:** GGUF | **Runtime:** Ollama / llama.cpp | **Created:** October 2025
309
 
310
  </div>
311
+
312
+
313
+ ## Hardware Requirements
314
+
315
+ KAT-Dev 72B is a large coding model. Choose your quantization based on available VRAM/RAM:
316
+
317
+ | Quantization | Model Size | VRAM Required | Quality |
318
+ |:------------:|:----------:|:-------------:|:--------|
319
+ | **Q2_K** | ~27 GB | 32 GB | Acceptable |
320
+ | **Q3_K_M** | ~34 GB | 40 GB | Good |
321
+ | **Q4_K_M** | ~42 GB | 48 GB | Very Good - recommended |
322
+ | **Q5_K_M** | ~50 GB | 56 GB | Excellent |
323
+ | **Q6_K** | ~58 GB | 64 GB | Near original |
324
+ | **Q8_0** | ~77 GB | 80 GB | Original quality |
325
+
326
+ ### Recommended Setups
327
+
328
+ | Hardware | Recommended Quantization |
329
+ |:---------|:-------------------------|
330
+ | RTX 4090 (24GB) | Q2_K with offloading |
331
+ | 2x RTX 4090 (48GB) | Q4_K_M |
332
+ | A100 (80GB) | Q8_0 |
333
+ | Mac Studio M2 Ultra (192GB) | Q8_0 via llama.cpp |
334
+