geoffmunn commited on
Commit
7ba860d
·
verified ·
1 Parent(s): 80831b8

Rename Qwen3-Coder-30B-A3B-Instruct-Q2_K/README.md to Qwen3-Coder-30B-A3B-Instruct-f32-Q2_K/README.md

Browse files
Qwen3-Coder-30B-A3B-Instruct-Q2_K/README.md DELETED
Binary file (2.77 kB)
 
Qwen3-Coder-30B-A3B-Instruct-f32-Q2_K/README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - qwen3-coder
7
+ - qwen3-coder-30b-q2
8
+ - qwen3-coder-30b-q2_k
9
+ - qwen3-coder-30b-q2_k-gguf
10
+ - llama.cpp
11
+ - quantized
12
+ - text-generation
13
+ - chat
14
+ - reasoning
15
+ - agent
16
+ - multilingual
17
+ base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
18
+ author: geoffmunn
19
+ ---
20
+
21
+ # Qwen3-Coder-30B-A3B-Instruct-f16:Q2_K
22
+
23
+ Quantized version of [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) at **Q2_K** level, derived from **f32** base weights.
24
+
25
+ ## Model Info
26
+
27
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
28
+ - **Size**: 11.30 GB
29
+ - **Precision**: Q2_K
30
+ - **Base Model**: [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct)
31
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
32
+
33
+ ## Quality & Performance
34
+
35
+ | Metric | Value |
36
+ |--------------------|-------------------------------------------------------|
37
+ | **Quality** | Minimal |
38
+ | **Speed** | ⚡ Fast |
39
+ | **RAM Required** | ~20.5 GB |
40
+ | **Recommendation** | Minimal quality; only for extreme memory constraints. |
41
+
42
+ ## Prompt Template (ChatML)
43
+
44
+ This model uses the **ChatML** format used by Qwen:
45
+
46
+ ```text
47
+ <|im_start|>system
48
+ You are a helpful assistant.<|im_end|>
49
+ <|im_start|>user
50
+ {prompt}<|im_end|>
51
+ <|im_start|>assistant
52
+ ```
53
+
54
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
55
+
56
+ ## Generation Parameters
57
+
58
+ Recommended defaults:
59
+
60
+ | Parameter | Value |
61
+ |----------------|-------|
62
+ | Temperature | 0.6 |
63
+ | Top-P | 0.95 |
64
+ | Top-K | 20 |
65
+ | Min-P | 0.0 |
66
+ | Repeat Penalty | 1.1 |
67
+
68
+ Stop sequences: `<|im_end|>`, `<|im_start|>`
69
+
70
+ ## 🖥️ CLI Example Using Ollama or TGI Server
71
+
72
+ Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server.
73
+
74
+ ```bash
75
+ curl http://localhost:11434/api/generate -s -N -d '{
76
+ "model": "hf.co/geoffmunn/Qwen3-Coder-30B-A3B-Instruct-f32:Q2_K",
77
+ "prompt": "Respond exactly as follows: Summarize what a neural network is in one sentence.",
78
+ "temperature": 0.3,
79
+ "top_p": 0.95,
80
+ "top_k": 20,
81
+ "min_p": 0.0,
82
+ "repeat_penalty": 1.1,
83
+ "stream": false
84
+ }' | jq -r '.response'
85
+ ```
86
+
87
+ 🎯 **Why this works well**:
88
+ - The prompt is meaningful and achievable for this model size.
89
+ - Temperature tuned appropriately: lower for factual (`0.5`), higher for creative (`0.7`).
90
+ - Uses `jq` to extract clean output.
91
+
92
+ ## Verification
93
+
94
+ Check integrity:
95
+
96
+ ```bash
97
+ sha256sum -c ../SHA256SUMS.txt
98
+ ```
99
+
100
+ ## Usage
101
+
102
+ Compatible with:
103
+ - [LM Studio](https://lmstudio.ai) – local AI model runner
104
+ - [OpenWebUI](https://openwebui.com) – self-hosted AI interface
105
+ - [GPT4All](https://gpt4all.io) – private, offline AI chatbot
106
+ - Directly via \llama.cpp\
107
+
108
+ ## License
109
+
110
+ Apache 2.0 – see base model for full terms.