DeepSeek-MoE-16B Q8_0 with CPU Offloading
Q8_0 quantization of DeepSeek-MoE-16B with CPU offloading support. Highest quality, near-F16 accuracy.
Performance
| Configuration | VRAM | Saved | Reduction | 
|---|---|---|---|
| All GPU | 17.11 GB | - | - | 
| CPU Offload | 2.33 GB | 14.78 GB | 86.4% | 
File Size: 17 GB (from 31 GB F16)
Usage
huggingface-cli download MikeKuykendall/deepseek-moe-16b-q8-0-cpu-offload-gguf
shimmy serve --model-dirs ./models --cpu-moe
License: Apache 2.0
- Downloads last month
 - 103
 
							Hardware compatibility
						Log In
								
								to view the estimation
8-bit
Model tree for MikeKuykendall/deepseek-moe-16b-q8-0-cpu-offload-gguf
Base model
deepseek-ai/deepseek-moe-16b-base