DeepSeek-MoE-16B Q8_0 with CPU Offloading

Q8_0 quantization of DeepSeek-MoE-16B with CPU offloading support. Highest quality, near-F16 accuracy.

Performance

Configuration VRAM Saved Reduction
All GPU 17.11 GB - -
CPU Offload 2.33 GB 14.78 GB 86.4%

File Size: 17 GB (from 31 GB F16)

Usage

huggingface-cli download MikeKuykendall/deepseek-moe-16b-q8-0-cpu-offload-gguf
shimmy serve --model-dirs ./models --cpu-moe

Links: Q2_K | Q4_K_M

License: Apache 2.0

Downloads last month
103
GGUF
Model size
16B params
Architecture
deepseek
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MikeKuykendall/deepseek-moe-16b-q8-0-cpu-offload-gguf

Quantized
(12)
this model