--- license: mit base_model: - deepseek-ai/DeepSeek-V3.1 --- Based on Unsloth BF16 GGUF and imatrix file. The quantization is not programatically selected. I carefully checked every detail of the imatrix statistics and obtain quantization suggestions from Qwen3-235B-A22B/DeepSeek V3.1/Gemini 2.5 Pro/ChatGPT. Full protection of first 0-2 dense layers. Full protection of output tensor and embedding layer. Further compression is possible by llama.cpp.
Quantization details --output-tensor-type BF16 --token-embedding-type BF16 --tensor-type attn_k_b=MXFP4 --tensor-type blk.[0|1|2|3|4].attn_k_b=BF16 --tensor-type attn_kv_a_mqa=Q4_K --tensor-type blk.[0|1|2].attn_kv_a_mqa=BF16 --tensor-type attn_output=IQ3_XXS --tensor-type blk.[0|1|2|3|4|5].attn_output=BF16 --tensor-type blk.58.attn_output=Q5_K --tensor-type blk.[59|60].attn_output=Q6_K --tensor-type attn_q_a=Q4_K --tensor-type blk.[0|1|2].attn_q_a=BF16 --tensor-type attn_q_b=Q4_K --tensor-type blk.[0|1|2|3|4|5].attn_q_b=BF16 --tensor-type blk.6.attn_q_b=Q6_K --tensor-type attn_v_b=Q6_K --tensor-type blk.[0|1|2].attn_v_b=BF16 --tensor-type blk.[0|1|2].ffn_down=BF16 --tensor-type blk.[0|1|2].ffn_up=BF16 --tensor-type blk.[0|1|2].ffn_gate=BF16 --tensor-type ffn_gate_exps=IQ1_S --tensor-type blk.[3|60].ffn_gate_exps=IQ2_XS --tensor-type ffn_up_exps=IQ1_S --tensor-type blk.[3|60].ffn_up_exps=IQ2_XS --tensor-type ffn_gate_shexp=Q6_K --tensor-type blk.[3|60].ffn_gate_shexp=BF16 --tensor-type ffn_up_shexp=Q6_K --tensor-type blk.[3|60].ffn_up_shexp=BF16 --tensor-type ffn_down_shexp=Q6_K --tensor-type blk.[3|60].ffn_down_shexp=BF16 --tensor-type ffn_down_exps=IQ1_S --tensor-type blk.[3|4].ffn_down_exps=BF16 --tensor-type blk.[5|6|7|8|9|33|46|59|60].ffn_down_exps=MXFP4 --tensor-type blk.[25-38,40-45].ffn_down_exps=IQ2_XS --tensor-type blk.39.ffn_down_exps=IQ2_S