This is the INT4 Llama-3-8b model quantized by per-group QQQ and the group size is 128. QQQ is an innovative and hardware-optimized W4A8 quantization solution. For more details, please refer to our code repo and our paper.

Downloads last month
47
Safetensors
Model size
2B params
Tensor type
F16
F32
I32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support