---
license: apache-2.0
tags:
  - gguf
  - qwen
  - qwen3-coder
  - qwen3-coder-30b-q5
  - qwen3-coder-30b-q5_k_s
  - qwen3-coder-30b-q5_k_s-gguf
  - llama.cpp
  - quantized
  - text-generation
  - chat
  - reasoning
  - agent
  - multilingual
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
author: geoffmunn
---

# Qwen3-Coder-30B-A3B-Instruct-f32:Q5_K_S

Quantized version of [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) at **Q5_K_S** level, derived from **f32** base weights.

## Model Info

- **Format**: GGUF (for llama.cpp and compatible runtimes)
- **Size**: 21.10 GB
- **Precision**: Q5_K_S
- **Base Model**: [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct)
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)

## Quality & Performance

| Metric             | Value                                                        |
|--------------------|--------------------------------------------------------------|
| **Speed**          | 🐢 Medium                                                    |
| **RAM Required**   | ~32.5 GB                                                     |
| **Recommendation** | High-quality for this model. Slight improvement over Q4_K_M. |

## Prompt Template (ChatML)

This model uses the **ChatML** format used by Qwen:

```text
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```

Set this in your app (LM Studio, OpenWebUI, etc.) for best results.

## Generation Parameters

Recommended defaults:

| Parameter      | Value |
|----------------|-------|
| Temperature    | 0.6   |
| Top-P          | 0.95  |
| Top-K          | 20    |
| Min-P          | 0.0   |
| Repeat Penalty | 1.1   |

Stop sequences: `<|im_end|>`, `<|im_start|>`

## 🖥️ CLI Example Using Ollama or TGI Server

Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server.

```bash
curl http://localhost:11434/api/generate -s -N -d '{
  "model": "hf.co/geoffmunn/Qwen3-Coder-30B-A3B-Instruct-f32:Q5_K_S",
  "prompt": "Respond exactly as follows: Write a haiku about autumn leaves falling gently in a quiet forest.",
  "temperature": 0.7,
  "top_p": 0.95,
  "top_k": 20,
  "min_p": 0.0,
  "repeat_penalty": 1.1,
  "stream": false
}' | jq -r '.response'
```

🎯 **Why this works well**:
- The prompt is meaningful and achievable for this model size.
- Temperature tuned appropriately: lower for factual (`0.5`), higher for creative (`0.7`).
- Uses `jq` to extract clean output.

## Verification

Check integrity:

```bash
sha256sum -c ../SHA256SUMS.txt
```

## Usage

Compatible with:
- [LM Studio](https://lmstudio.ai) – local AI model runner
- [OpenWebUI](https://openwebui.com) – self-hosted AI interface
- [GPT4All](https://gpt4all.io) – private, offline AI chatbot
- Directly via \llama.cpp\

## License

Apache 2.0 – see base model for full terms.