HandH1998
/

QQQ-Llama-3-8b-g128

Text Generation

Transformers

Safetensors

llama

conversational

text-generation-inference

qqq

Model card Files Files and versions

xet

Community

This is the INT4 Llama-3-8b model quantized by per-group QQQ and the group size is 128. QQQ is an innovative and hardware-optimized W4A8 quantization solution. For more details, please refer to our code repo and our paper.

Downloads last month: 47

Safetensors

Model size

2B params

Tensor type

F16

F32

I32