| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						language: | 
					
					
						
						| 
							 | 
						- en | 
					
					
						
						| 
							 | 
						license: apache-2.0 | 
					
					
						
						| 
							 | 
						tags: | 
					
					
						
						| 
							 | 
						- quantization | 
					
					
						
						| 
							 | 
						- sinq | 
					
					
						
						| 
							 | 
						- int3 | 
					
					
						
						| 
							 | 
						- efficient-inference | 
					
					
						
						| 
							 | 
						- text-generation | 
					
					
						
						| 
							 | 
						- qwen | 
					
					
						
						| 
							 | 
						- llm | 
					
					
						
						| 
							 | 
						- compression | 
					
					
						
						| 
							 | 
						base_model: Qwen/Qwen3-1.7B | 
					
					
						
						| 
							 | 
						base_model_relation: quantized | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						<p align="center"> | 
					
					
						
						| 
							 | 
						  <img src="logo.png" alt="Logo" style="max-width: 80%; height: auto;"> | 
					
					
						
						| 
							 | 
						</p> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						<p align="center">π <a href="https://github.com/huawei-csl/SINQ">Github</a>   |   π <a href="http://arxiv.org/abs/2509.22944">Paper</a></p> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						# A-SINQ 3-bit Quantized Qwen3-1.7B model | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						This repository contains the official **3-bit quantized** version of the [`Qwen3-1.7B`](https://huggingface.co/Qwen/Qwen3-1.7B) model using the *calibrated* version of **SINQ (Sinkhorn-Normalized Quantization)** method.   | 
					
					
						
						| 
							 | 
						SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact.  | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						To support the project please put a star β in the official [SINQ](https://github.com/huawei-csl/SINQ) github repository.  | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Model Details | 
					
					
						
						| 
							 | 
						- **Model Name:** `Qwen3-1.7B-3bit-ASINQ ` | 
					
					
						
						| 
							 | 
						- **Base Model:** [`Qwen/Qwen3-1.7B`](https://huggingface.co/Qwen/Qwen3-1.7B) | 
					
					
						
						| 
							 | 
						- **Task:** Text Generation | 
					
					
						
						| 
							 | 
						- **Framework:** PyTorch / Transformers | 
					
					
						
						| 
							 | 
						- **License:** [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | 
					
					
						
						| 
							 | 
						- **Quantized By:** *Huawei - Computing Systems Lab* | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Quantization Details | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						- **Quantization Method:**  A-SINQ (Sinkhorn-Normalized Quantization) | 
					
					
						
						| 
							 | 
						- **Precision:** INT3  | 
					
					
						
						| 
							 | 
						- **Group Size:**  64  | 
					
					
						
						| 
							 | 
						- **Framework:**  PyTorch  | 
					
					
						
						| 
							 | 
						- **Quantization Library:**  `sinq`  | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						# π Usage</span> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Prerequisite | 
					
					
						
						| 
							 | 
						Before running the quantization script, make sure the **SINQ** library is installed. | 
					
					
						
						| 
							 | 
						Installation instructions and setup details are available in the [SINQ official github repository](https://github.com/huawei-csl/SINQ). | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Usage example | 
					
					
						
						| 
							 | 
						You can load and use the model with our wrapper based on the π€ Transformers library: | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```python | 
					
					
						
						| 
							 | 
						from transformers import AutoTokenizer | 
					
					
						
						| 
							 | 
						from sinq.patch_model import AutoSINQHFModel | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						model_name = "huawei-csl/Qwen3-1.7B-3bit-ASINQ" | 
					
					
						
						| 
							 | 
						tokenizer = AutoTokenizer.from_pretrained(model_name) | 
					
					
						
						| 
							 | 
						sinq_model = AutoSINQHFModel.from_quantized_safetensors( | 
					
					
						
						| 
							 | 
						    model_name, | 
					
					
						
						| 
							 | 
						    device="cuda:0", | 
					
					
						
						| 
							 | 
						    compute_dtype=torch.bfloat16 | 
					
					
						
						| 
							 | 
						) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						prompt = "Explain neural network quantization in one sentence." | 
					
					
						
						| 
							 | 
						inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0") | 
					
					
						
						| 
							 | 
						with torch.inference_mode(): | 
					
					
						
						| 
							 | 
						    out_ids = sinq_model.generate(**inputs, max_new_tokens=32, do_sample=False) | 
					
					
						
						| 
							 | 
						print(tokenizer.decode(out_ids[0], skip_special_tokens=True)) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						<details> | 
					
					
						
						| 
							 | 
						<summary><span style="font-size:1.1em; font-weight:bold;">π§© Quantization Process</span></summary> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						The quantized model was obtained using the **SINQ** quantization library, following the steps below: | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```python | 
					
					
						
						| 
							 | 
						from transformers import AutoModelForCausalLM, AutoTokenizer | 
					
					
						
						| 
							 | 
						from sinq.patch_model import AutoSINQHFModel | 
					
					
						
						| 
							 | 
						from sinq.sinqlinear import BaseQuantizeConfig | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Load base model | 
					
					
						
						| 
							 | 
						base_model_name = "Qwen/Qwen3-1.7B" | 
					
					
						
						| 
							 | 
						model = AutoModelForCausalLM.from_pretrained(base_model_name, torch_dtype="float16") | 
					
					
						
						| 
							 | 
						tokenizer = AutoTokenizer.from_pretrained(base_model_name) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Apply 3-bit SINQ quantization | 
					
					
						
						| 
							 | 
						quant_cfg = BaseQuantizeConfig( | 
					
					
						
						| 
							 | 
						    nbits=3,            # quantization bit-width | 
					
					
						
						| 
							 | 
						    group_size=64,     # group size | 
					
					
						
						| 
							 | 
						    tiling_mode="1D",   # tiling strategy | 
					
					
						
						| 
							 | 
						    method="asinq"       # quantization method ("asinq" for the calibrated version) | 
					
					
						
						| 
							 | 
						) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						qmodel = AutoSINQHFModel.quantize_model( | 
					
					
						
						| 
							 | 
						    model, | 
					
					
						
						| 
							 | 
						    tokenizer=tokenizer, | 
					
					
						
						| 
							 | 
						    quant_config=quant_cfg, | 
					
					
						
						| 
							 | 
						    compute_dtype=torch.bfloat16, | 
					
					
						
						| 
							 | 
						    device="cuda:0" | 
					
					
						
						| 
							 | 
						) | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						> **Reproducibility Note**: This model was quantized using the SINQ implementation from commit [`14ad847`](https://github.com/huawei-csl/SINQ/commit/14ad847d0ab25f1794b8820506f59b5c9c1fc979) of the [SINQ](https://github.com/huawei-csl/SINQ) repository.   | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						</details> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						</br> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						# π§Ύ How to Cite This Work | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						If you find **SINQ** useful in your research or applications, please | 
					
					
						
						| 
							 | 
						- Put a star β in the official [SINQ](https://github.com/huawei-csl/SINQ) github repository. | 
					
					
						
						| 
							 | 
						- Cite our <a href="http://arxiv.org/abs/2509.22944" target="_blank"><strong>paper</strong></a>: | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```bibtex | 
					
					
						
						| 
							 | 
						@misc{muller2025sinq, | 
					
					
						
						| 
							 | 
						      title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights},  | 
					
					
						
						| 
							 | 
						      author={Lorenz K. Muller and Philippe Bich and Jiawei Zhuang and Ahmet Celik and Luca Benfenati and Lukas Cavigelli}, | 
					
					
						
						| 
							 | 
						      year={2025}, | 
					
					
						
						| 
							 | 
						      eprint={2509.22944}, | 
					
					
						
						| 
							 | 
						      archivePrefix={arXiv}, | 
					
					
						
						| 
							 | 
						      primaryClass={cs.LG}, | 
					
					
						
						| 
							 | 
						      url={http://arxiv.org/abs/2509.22944} | 
					
					
						
						| 
							 | 
						} | 
					
					
						
						| 
							 | 
						``` |