gghfez
/

DeepSeek-V3-0324-IQ2_KS

Text Generation

Model card Files Files and versions

gghfez commited on Oct 18

Commit

529606c

·

verified ·

1 Parent(s): 794c4a6

Create README.md

Files changed (1) hide show

README.md +25 -0

README.md ADDED Viewed

	@@ -0,0 +1,25 @@

+---
+quantized_by: gghfez
+pipeline_tag: text-generation
+base_model: deepseek-ai/DeepSeek-V3-0324
+license: mit
+base_model_relation: quantized
+tags:
+- mla
+- imatrix
+- deepseek_v3
+- conversational
+- ik_llama.cpp
+---
+## `ik_llama.cpp` imatrix MLA Quantizations of DeepSeek-V3-0324
+This is an IQ2_KS quant of DeepSeek-V3-0324 using [ubergarm](https://huggingface.co/ubergarm)'s IQ2_KS recipe from [ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF](https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF) and Imatrix file from [ubergarm/DeepSeek-V3-0324-GGUF](https://huggingface.co/ubergarm/DeepSeek-V3-0324-GGUF).
+This quant collection **REQUIRES** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support advanced non-linear SotA quants and Multi-Head Latent Attention (MLA). Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
+See [ubergarm/DeepSeek-V3-0324-GGUF](https://huggingface.co/ubergarm/DeepSeek-V3-0324-GGUF) for his other quants and more details about them.
+I've uploaded the converted BF16 weights [gghfez/DeepSeek-V3-0324-256x21B-BF16](https://huggingface.co/gghfez/DeepSeek-V3-0324-256x21B-BF16) if I, or anyone else wants to create similar quants in the future.
+TODO: fix links, etc in the model card.