Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,17 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
This repository contains 3 versions of Qwen3-30B-A3B quantized with `IQ4_KS`. The interesting part is that these models achieve a lower peplexity on
|
| 7 |
+
`wiki.test.raw` than the original `bf16` model. This is interesting, considering that no QAT has been mentioned
|
| 8 |
+
in the [Qwen3 announcement](https://qwenlm.github.io/blog/qwen3/). Hence I'm putting them out there for anyone interested in evaluating performance by means other than PPL,
|
| 9 |
+
or just using for local inferrence.
|
| 10 |
+
For more details see [this discussion](https://github.com/ikawrakow/ik_llama.cpp/discussions/359).
|
| 11 |
+
|
| 12 |
+
**Note**: These models will only work with [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) as the `IQ4_KS` quantization type is not available in mainline `llama.cpp`.
|
| 13 |
+
|
| 14 |
+
The only difference between the 3 models is the imatrix used:
|
| 15 |
+
* **Qwen3-30B-A3B-IQ4_KS-IK-Imatrix.gguf**: imatrix computed using 500,000 tokens from `wiki.train.raw`
|
| 16 |
+
* **Qwen3-30B-A3B-IQ4_KS-Bartowski-Imatrix.gguf**: Bartowski's [imatrix](https://huggingface.co/bartowski/Qwen_Qwen3-30B-A3B-GGUF/blob/main/Qwen_Qwen3-30B-A3B.imatrix)
|
| 17 |
+
* **Qwen3-30B-A3B-IQ4_KS-Unsloth-Imatrix.gguf**: Unsloth [imatrix](https://huggingface.co/unsloth/Qwen3-30B-A3B-128K-GGUF/blob/main/imatrix_unsloth.dat)
|