|
|
--- |
|
|
base_model: |
|
|
- meta-llama/Llama-3.2-3B-Instruct |
|
|
base_model_relation: quantized |
|
|
license: llama3.2 |
|
|
--- |
|
|
# Model Card |
|
|
|
|
|
- Base model: `meta-llama/Llama-3.2-3B-Instruct` |
|
|
- Quantization method: BlockLDLQ with GuidedQuant Hessian |
|
|
- Target bit-width: 4 |
|
|
- Backend kernel: QTIP kernel (HYB variant) |
|
|
- Calibration data: RedPajama (1024 sentences / 4096 tokens) |
|
|
- Calibration objective: Next-token prediction |
|
|
- num_groups (for GuidedQuant Hessian): 1 |
|
|
|
|
|
# How to run |
|
|
- Follow the instruction in https://github.com/snu-mllab/GuidedQuant and https://github.com/Cornell-RelaxML/qtip |
|
|
|
|
|
# References |
|
|
- [Model Paper](https://arxiv.org/abs/2505.07004) |