chiaraboretti commited on
Commit
17f9cae
·
verified ·
1 Parent(s): ec8ef97

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -0
README.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - quantization
7
+ - sinq
8
+ - int3
9
+ - efficient-inference
10
+ - text-generation
11
+ - qwen
12
+ - llm
13
+ - compression
14
+ ---
15
+
16
+ <p align="center" style="margin:0;">
17
+ <img src="logo.png" width="150" style="margin:0; padding:0;"/>
18
+ </p>
19
+
20
+ <p align="center">🐙 <a href="https://github.com/huawei-csl/SINQ">Github</a>&nbsp;&nbsp; | &nbsp;&nbsp;📄 <a href="http://arxiv.org/abs/2509.22944">Paper</a></p>
21
+
22
+
23
+ # SINQ 3-bit Quantized Qwen3-14B model
24
+
25
+ This repository contains the official **3-bit quantized** version of the [`Qwen3-14B`](https://huggingface.co/Qwen/Qwen3-14B) model using the **SINQ (Sinkhorn-Normalized Quantization)** method.
26
+ SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact.
27
+
28
+ To support the project please put a star ⭐ in the official [SINQ](https://github.com/huawei-csl/SINQ) github repository.
29
+
30
+ ## Model Details
31
+ - **Model Name:** `Qwen3-14B-3bit-SINQ `
32
+ - **Base Model:** [`Qwen/Qwen3-14B`](https://huggingface.co/Qwen/Qwen3-14B)
33
+ - **Task:** Text Generation
34
+ - **Framework:** PyTorch / Transformers
35
+ - **License:** [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)
36
+ - **Quantized By:** *Huawei - Computing System Lab*
37
+
38
+
39
+ ## Quantization Details
40
+
41
+ - **Quantization Method:** SINQ (Sinkhorn-Normalized Quantization)
42
+ - **Precision:** INT3
43
+ - **Group Size:** 64
44
+ - **Framework:** PyTorch
45
+ - **Quantization Library:** `sinq`
46
+
47
+ ---
48
+
49
+ # 🚀 Usage</span>
50
+
51
+ ## Prerequisite
52
+ Before running the quantization script, make sure the **SINQ** library is installed.
53
+ Installation instructions and setup details are available in the [SINQ official github repository](https://github.com/huawei-csl/SINQ).
54
+
55
+ ## Usage example
56
+ You can load and use the model with our wrapper based on the 🤗 Transformers library:
57
+
58
+ ```python
59
+ from transformers import AutoTokenizer
60
+ from sinq.patch_model import AutoSINQHFModel
61
+
62
+ model_name = "huawei-cls/Qwen3-14B-3bit-SINQ"
63
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
64
+ sinq_model = AutoSINQHFModel.from_quantized_safetensors(
65
+ model_name,
66
+ device="cuda:0",
67
+ compute_dtype=torch.bfloat16
68
+ )
69
+
70
+ prompt = "Explain neural network quantization in one sentence."
71
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
72
+ with torch.inference_mode():
73
+ out_ids = sinq_model.generate(**inputs, max_new_tokens=32, do_sample=False)
74
+ print(tokenizer.decode(out_ids[0], skip_special_tokens=True))
75
+
76
+ ```
77
+
78
+ <details>
79
+ <summary><span style="font-size:1.1em; font-weight:bold;">🧩 Quantization Process</span></summary>
80
+
81
+ The quantized model was obtained using the **SINQ** quantization library, following the steps below:
82
+
83
+ ```python
84
+ from transformers import AutoModelForCausalLM, AutoTokenizer
85
+ from sinq.patch_model import AutoSINQHFModel
86
+ from sinq.sinqlinear import BaseQuantizeConfig
87
+
88
+ # Load base model
89
+ base_model_name = "Qwen/Qwen3-14B"
90
+ model = AutoModelForCausalLM.from_pretrained(base_model_name, torch_dtype="float16")
91
+ tokenizer = AutoTokenizer.from_pretrained(base_model_name)
92
+
93
+ # Apply 3-bit SINQ quantization
94
+ quant_cfg = BaseQuantizeConfig(
95
+ nbits=3, # quantization bit-width
96
+ group_size=64, # group size
97
+ tiling_mode="1D", # tiling strategy
98
+ method="sinq" # quantization method ("asinq" for the calibrated version)
99
+ )
100
+
101
+ qmodel = AutoSINQHFModel.quantize_model(
102
+ model,
103
+ tokenizer=tokenizer,
104
+ quant_config=quant_cfg,
105
+ compute_dtype=torch.bfloat16,
106
+ device="cuda:0"
107
+ )
108
+ ```
109
+
110
+ > **Reproducibility Note**: This model was quantized using the SINQ implementation from commit [`14ad847`](https://github.com/huawei-csl/SINQ/commit/14ad847d0ab25f1794b8820506f59b5c9c1fc979) of the [SINQ](https://github.com/huawei-csl/SINQ) repository.
111
+
112
+ </details>
113
+
114
+ </br>
115
+
116
+ ---
117
+
118
+ # 🧾 How to Cite This Work
119
+
120
+ If you find **SINQ** useful in your research or applications, please
121
+ - Put a star ⭐ in the official [SINQ](https://github.com/huawei-csl/SINQ) github repository.
122
+ - Cite our <a href="http://arxiv.org/abs/2509.22944" target="_blank"><strong>paper</strong></a>:
123
+
124
+ ```bibtex
125
+ @misc{muller2025sinq,
126
+ title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights},
127
+ author={Lorenz K. Muller and Philippe Bich and Jiawei Zhuang and Ahmet Celik and Luca Benfenati and Lukas Cavigelli},
128
+ year={2025},
129
+ eprint={2509.22944},
130
+ archivePrefix={arXiv},
131
+ primaryClass={cs.LG},
132
+ url={http://arxiv.org/abs/2509.22944}
133
+ }
134
+ ```