radi-cho commited on
Commit
3a01aab
·
verified ·
1 Parent(s): 75069b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -3
README.md CHANGED
@@ -1,3 +1,30 @@
1
- ---
2
- license: gemma
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ ---
4
+
5
+ [AWQ](https://arxiv.org/abs/2306.00978)-quantized package (W4G128) of [`google/gemma-2-2b`](https://huggingface.co/google/gemma-2-2b).
6
+ Support for Gemma2 in the codebase of AutoAWQ is proposed in the following [pull request](https://github.com/casper-hansen/AutoAWQ/pull/562).
7
+ To use the model, follow the AutoAWQ examples with the source from [#562](https://github.com/casper-hansen/AutoAWQ/pull/562).
8
+
9
+ **Evaluation**<br>
10
+ WikiText-2 PPL: 11.05<br>
11
+ C4 PPL: 12.99
12
+
13
+ **Loading**
14
+
15
+ ```py
16
+ model_path = "radi-cho/gemma-2-2b-AWQ"
17
+
18
+ # With transformers
19
+ from transformers import AutoModelForCausalLM
20
+ model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda:0")
21
+
22
+ # With transformers (fused)
23
+ from transformers import AutoModelForCausalLM, AwqConfig
24
+ quantization_config = AwqConfig(bits=4, fuse_max_seq_len=512, do_fuse=True)
25
+ model = AutoModelForCausalLM.from_pretrained(model_path, quantization_config=quantization_config).to(0)
26
+
27
+ # With AutoAWQ
28
+ from awq import AutoAWQForCausalLM
29
+ model = AutoAWQForCausalLM.from_quantized(model_path)
30
+ ```