DeepSeek-MoE-16B Q8_0 with CPU Offloading

Q8_0 quantization of DeepSeek-MoE-16B with CPU offloading support. Highest quality, near-F16 accuracy.

Performance

Configuration	VRAM	Saved	Reduction
All GPU	17.11 GB	-	-
CPU Offload	2.33 GB	14.78 GB	86.4%

File Size: 17 GB (from 31 GB F16)

huggingface-cli download MikeKuykendall/deepseek-moe-16b-q8-0-cpu-offload-gguf
shimmy serve --model-dirs ./models --cpu-moe

Links: Q2_K | Q4_K_M

License: Apache 2.0

GGUF

Model size

16B params

Architecture

deepseek

Hardware compatibility

8-bit

Base model

Quantized

(12)

this model