File size: 1,065 Bytes
3a01aab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
license: gemma
---

[AWQ](https://arxiv.org/abs/2306.00978)-quantized package (W4G128) of [`google/gemma-2-2b`](https://huggingface.co/google/gemma-2-2b).
Support for Gemma2 in the codebase of AutoAWQ is proposed in the following [pull request](https://github.com/casper-hansen/AutoAWQ/pull/562).
To use the model, follow the AutoAWQ examples with the source from [#562](https://github.com/casper-hansen/AutoAWQ/pull/562).

**Evaluation**<br>
WikiText-2 PPL: 11.05<br>
C4 PPL: 12.99

**Loading**

```py
model_path = "radi-cho/gemma-2-2b-AWQ"

# With transformers
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda:0")

# With transformers (fused)
from transformers import AutoModelForCausalLM, AwqConfig
quantization_config = AwqConfig(bits=4, fuse_max_seq_len=512, do_fuse=True)
model = AutoModelForCausalLM.from_pretrained(model_path, quantization_config=quantization_config).to(0)

# With AutoAWQ
from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_quantized(model_path)
```