--- base_model: - meta-llama/Llama-3.2-3B-Instruct base_model_relation: quantized license: llama3.1 --- # Model Card - Base model: `meta-llama/Llama-3.1-70B-Instruct` - Quantization method: BlockLDLQ with GuidedQuant Hessian - Target bit-width: 3 - Backend kernel: QTIP kernel (HYB variant) - Calibration data: RedPajama (1024 sentences / 4096 tokens) - Calibration objective: Next-token prediction - num_groups (for GuidedQuant Hessian): 1 - skip_list: 0_v (not quantizing 0_v layer, following YAQA paper) # How to run - Follow the instruction in https://github.com/snu-mllab/GuidedQuant and https://github.com/Cornell-RelaxML/qtip # References - [Model Paper](https://arxiv.org/abs/2505.07004)