---
license: mit
base_model:
- deepseek-ai/DeepSeek-V3.1
---

Based on Unsloth BF16 GGUF and imatrix file. The quantization is not programatically selected.  
I carefully checked every detail of the imatrix statistics and obtain quantization suggestions from Qwen3-235B-A22B/DeepSeek V3.1/Gemini 2.5 Pro/ChatGPT.  
Full protection of first 0-2 dense layers.  
Full protection of output tensor and embedding layer.  
Further compression is possible by llama.cpp.
<details>

<summary> Quantization details </summary>

--output-tensor-type BF16  
--token-embedding-type BF16     
--tensor-type attn_k_b=MXFP4  --tensor-type blk.[0|1|2|3|4].attn_k_b=BF16  
--tensor-type attn_kv_a_mqa=Q4_K --tensor-type blk.[0|1|2].attn_kv_a_mqa=BF16  
--tensor-type attn_output=IQ3_XXS --tensor-type blk.[0|1|2|3|4|5].attn_output=BF16 --tensor-type blk.58.attn_output=Q5_K --tensor-type blk.[59|60].attn_output=Q6_K  
--tensor-type attn_q_a=Q4_K  --tensor-type blk.[0|1|2].attn_q_a=BF16  
--tensor-type attn_q_b=Q4_K  --tensor-type blk.[0|1|2|3|4|5].attn_q_b=BF16  --tensor-type blk.6.attn_q_b=Q6_K   
--tensor-type attn_v_b=Q6_K  --tensor-type blk.[0|1|2].attn_v_b=BF16  
--tensor-type blk.[0|1|2].ffn_down=BF16  
--tensor-type blk.[0|1|2].ffn_up=BF16    
--tensor-type blk.[0|1|2].ffn_gate=BF16   
--tensor-type ffn_gate_exps=IQ1_S  --tensor-type blk.[3|60].ffn_gate_exps=IQ2_XS  
--tensor-type ffn_up_exps=IQ1_S  --tensor-type blk.[3|60].ffn_up_exps=IQ2_XS  
--tensor-type ffn_gate_shexp=Q6_K  --tensor-type blk.[3|60].ffn_gate_shexp=BF16  
--tensor-type ffn_up_shexp=Q6_K  --tensor-type blk.[3|60].ffn_up_shexp=BF16   
--tensor-type ffn_down_shexp=Q6_K --tensor-type blk.[3|60].ffn_down_shexp=BF16   
--tensor-type ffn_down_exps=IQ1_S   
--tensor-type blk.[3|4].ffn_down_exps=BF16  
--tensor-type blk.[5|6|7|8|9|33|46|59|60].ffn_down_exps=MXFP4  
--tensor-type blk.[25-38,40-45].ffn_down_exps=IQ2_XS  
--tensor-type blk.39.ffn_down_exps=IQ2_S  

</details>