File size: 2,770 Bytes
0df78e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---

license: apache-2.0
tags:
  - gguf
  - qwen
  - llama.cpp
  - quantized
  - text-generation
  - chat
  - reasoning   - agent   - multilingual
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
author: geoffmunn
---


# Qwen3-Coder-30B-A3B-Instruct-Q3_K_M

Quantized version of [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) at **Q3_K_M** level, derived from **f32** base weights.

## Model Info

- **Format**: GGUF (for llama.cpp and compatible runtimes)
- **Size**: 13.70 GB
- **Precision**: Q3_K_M
- **Base Model**: [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct)
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)

## Quality & Performance

| Metric | Value |
|-------|-------|
| **Quality** |  |
| **Speed** | ⚡ Fast |
| **RAM Required** | ~24.3 GB |
| **Recommendation** | Acceptable for basic interaction on legacy hardware. |

## Prompt Template (ChatML)

This model uses the **ChatML** format used by Qwen:

\\\	ext
<|im_start|>system

You are a helpful assistant.<|im_end|>
<|im_start|>user

{prompt}<|im_end|>
<|im_start|>assistant

\\\



Set this in your app (LM Studio, OpenWebUI, etc.) for best results.



## Generation Parameters



Recommended defaults:



| Parameter | Value |

|---------|-------|

| Temperature | 0.6 |

| Top-P | 0.95 |

| Top-K | 20 |

| Min-P | 0.0 |

| Repeat Penalty | 1.1 |



Stop sequences: \<|im_end|>\, \<|im_start|>\



## 🖥️ CLI Example Using Ollama or TGI Server



Here’s how you can query this model via API using \curl\ and \jq\. Replace the endpoint with your local server.



\\\ash

curl http://localhost:11434/api/generate -s -N -d '{

  "model": "hf.co/geoffmunn/$MODEL_NAME:${QTYPE}",
  "prompt": "Respond exactly as follows: Summarize what a neural network is in one sentence.",
  "temperature": 0.3,
  "top_p": 0.95,

  "top_k": 20,
  "min_p": 0.0,

  "repeat_penalty": 1.1,
  "stream": false
}' | jq -r '.response'
\\\

🎯 **Why this works well**:
- The prompt is meaningful and achievable for this model size.
- Temperature tuned appropriately: lower for factual (\.5\), higher for creative (\.7\).
- Uses \jq\ to extract clean output.

## Verification

Check integrity:

\\\ash
certutil -hashfile 'Qwen3-Coder-30B-A3B-Instruct-f32:Q3_K_M.gguf' SHA256
# Compare with values in SHA256SUMS.txt
\\\

## Usage

Compatible with:
- [LM Studio](https://lmstudio.ai) – local AI model runner
- [OpenWebUI](https://openwebui.com) – self-hosted AI interface
- [GPT4All](https://gpt4all.io) – private, offline AI chatbot
- Directly via \llama.cpp\

## License

Apache 2.0 – see base model for full terms.