jusjinuk
/

Llama-3.2-3B-Instruct-4bit-GuidedQuant-QTIP

Model card Files Files and versions

Llama-3.2-3B-Instruct-4bit-GuidedQuant-QTIP / README.md

jusjinuk's picture

Update README.md

e48ecb5 verified 5 months ago

|

history blame contribute delete

631 Bytes

metadata

base_model:
  - meta-llama/Llama-3.2-3B-Instruct
base_model_relation: quantized
license: llama3.2

Model Card

Base model: meta-llama/Llama-3.2-3B-Instruct
Quantization method: BlockLDLQ with GuidedQuant Hessian
Target bit-width: 4
Backend kernel: QTIP kernel (HYB variant)
Calibration data: RedPajama (1024 sentences / 4096 tokens)
Calibration objective: Next-token prediction
num_groups (for GuidedQuant Hessian): 1

How to run

Follow the instruction in https://github.com/snu-mllab/GuidedQuant and https://github.com/Cornell-RelaxML/qtip

References

Model Paper