geoffmunn commited on
Commit
9d20830
·
verified ·
1 Parent(s): 80083b2

Rename Qwen3-Coder-30B-A3B-Instruct-Q3_K_S/README.md to Qwen3-Coder-30B-A3B-Instruct-f32-Q3_K_S/README.md

Browse files
Qwen3-Coder-30B-A3B-Instruct-Q3_K_S/README.md DELETED
Binary file (2.77 kB)
 
Qwen3-Coder-30B-A3B-Instruct-f32-Q3_K_S/README.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - qwen3-coder
7
+ - qwen3-coder-30b-q3
8
+ - qwen3-coder-30b-q3_k_s
9
+ - qwen3-coder-30b-q3_k_s-gguf
10
+ - llama.cpp
11
+ - quantized
12
+ - text-generation
13
+ - chat
14
+ - reasoning
15
+ - agent
16
+ - multilingual
17
+ base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
18
+ author: geoffmunn
19
+ ---
20
+
21
+ # Qwen3-Coder-30B-A3B-Instruct-f32:Q3_K_S
22
+
23
+ Quantized version of [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) at **Q3_K_S** level, derived from **f32** base weights.
24
+
25
+ ## Model Info
26
+
27
+ - **Format**: GGUF (for llama.cpp and compatible runtimes)
28
+ - **Size**: 13.30 GB
29
+ - **Precision**: Q3_K_S
30
+ - **Base Model**: [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct)
31
+ - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
32
+
33
+ ## Quality & Performance
34
+
35
+ | Metric | Value |
36
+ |--------------------|---------------------------------------------------------|
37
+ | **Speed** | ⚡ Fast |
38
+ | **RAM Required** | ~23.2 GB |
39
+ | **Recommendation** | Low quality; barely usable. Avoid unless space-limited. |
40
+
41
+ ## Prompt Template (ChatML)
42
+
43
+ This model uses the **ChatML** format used by Qwen:
44
+
45
+ ```text
46
+ <|im_start|>system
47
+ You are a helpful assistant.<|im_end|>
48
+ <|im_start|>user
49
+ {prompt}<|im_end|>
50
+ <|im_start|>assistant
51
+ ```
52
+
53
+ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
54
+
55
+ ## Generation Parameters
56
+
57
+ Recommended defaults:
58
+
59
+ | Parameter | Value |
60
+ |----------------|-------|
61
+ | Temperature | 0.6 |
62
+ | Top-P | 0.95 |
63
+ | Top-K | 20 |
64
+ | Min-P | 0.0 |
65
+ | Repeat Penalty | 1.1 |
66
+
67
+ Stop sequences: `<|im_end|>`, `<|im_start|>`
68
+
69
+ ## 🖥️ CLI Example Using Ollama or TGI Server
70
+
71
+ Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server.
72
+
73
+ ```bash
74
+ curl http://localhost:11434/api/generate -s -N -d '{
75
+ "model": "hf.co/geoffmunn/Qwen3-Coder-30B-A3B-Instruct-f32:Q3_K_S",
76
+ "prompt": "Respond exactly as follows: Summarize what a neural network is in one sentence.",
77
+ "temperature": 0.3,
78
+ "top_p": 0.95,
79
+ "top_k": 20,
80
+ "min_p": 0.0,
81
+ "repeat_penalty": 1.1,
82
+ "stream": false
83
+ }' | jq -r '.response'
84
+ ```
85
+
86
+ 🎯 **Why this works well**:
87
+ - The prompt is meaningful and achievable for this model size.
88
+ - Temperature tuned appropriately: lower for factual (`0.5`), higher for creative (`0.7`).
89
+ - Uses `jq` to extract clean output.
90
+
91
+ ## Verification
92
+
93
+ Check integrity:
94
+
95
+ ```bash
96
+ sha256sum -c ../SHA256SUMS-txt
97
+ ```
98
+
99
+ ## Usage
100
+
101
+ Compatible with:
102
+ - [LM Studio](https://lmstudio.ai) – local AI model runner
103
+ - [OpenWebUI](https://openwebui.com) – self-hosted AI interface
104
+ - [GPT4All](https://gpt4all.io) – private, offline AI chatbot
105
+ - Directly via \llama.cpp\
106
+
107
+ ## License
108
+
109
+ Apache 2.0 – see base model for full terms.