radi-cho
/

gemma-2-2b-AWQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

radi-cho commited on Aug 1, 2024

Commit

3a01aab

·

verified ·

1 Parent(s): 75069b9

Update README.md

Files changed (1) hide show

README.md +30 -3

README.md CHANGED Viewed

@@ -1,3 +1,30 @@
----
-license: gemma
----

+---
+license: gemma
+---
+[AWQ](https://arxiv.org/abs/2306.00978)-quantized package (W4G128) of [`google/gemma-2-2b`](https://huggingface.co/google/gemma-2-2b).
+Support for Gemma2 in the codebase of AutoAWQ is proposed in the following [pull request](https://github.com/casper-hansen/AutoAWQ/pull/562).
+To use the model, follow the AutoAWQ examples with the source from [#562](https://github.com/casper-hansen/AutoAWQ/pull/562).
+**Evaluation**<br>
+WikiText-2 PPL: 11.05<br>
+C4 PPL: 12.99
+**Loading**
+```py
+model_path = "radi-cho/gemma-2-2b-AWQ"
+# With transformers
+from transformers import AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda:0")
+# With transformers (fused)
+from transformers import AutoModelForCausalLM, AwqConfig
+quantization_config = AwqConfig(bits=4, fuse_max_seq_len=512, do_fuse=True)
+model = AutoModelForCausalLM.from_pretrained(model_path, quantization_config=quantization_config).to(0)
+# With AutoAWQ
+from awq import AutoAWQForCausalLM
+model = AutoAWQForCausalLM.from_quantized(model_path)
+```