--- license: apache-2.0 tags: - gguf - qwen - qwen3-coder - qwen3-coder-30b-q5 - qwen3-coder-30b-q5_k_s - qwen3-coder-30b-q5_k_s-gguf - llama.cpp - quantized - text-generation - chat - reasoning - agent - multilingual base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct author: geoffmunn --- # Qwen3-Coder-30B-A3B-Instruct-f32:Q5_K_S Quantized version of [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) at **Q5_K_S** level, derived from **f32** base weights. ## Model Info - **Format**: GGUF (for llama.cpp and compatible runtimes) - **Size**: 21.10 GB - **Precision**: Q5_K_S - **Base Model**: [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) - **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp) ## Quality & Performance | Metric | Value | |--------------------|--------------------------------------------------------------| | **Speed** | 🐒 Medium | | **RAM Required** | ~32.5 GB | | **Recommendation** | High-quality for this model. Slight improvement over Q4_K_M. | ## Prompt Template (ChatML) This model uses the **ChatML** format used by Qwen: ```text <|im_start|>system You are a helpful assistant.<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant ``` Set this in your app (LM Studio, OpenWebUI, etc.) for best results. ## Generation Parameters Recommended defaults: | Parameter | Value | |----------------|-------| | Temperature | 0.6 | | Top-P | 0.95 | | Top-K | 20 | | Min-P | 0.0 | | Repeat Penalty | 1.1 | Stop sequences: `<|im_end|>`, `<|im_start|>` ## πŸ–₯️ CLI Example Using Ollama or TGI Server Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server. ```bash curl http://localhost:11434/api/generate -s -N -d '{ "model": "hf.co/geoffmunn/Qwen3-Coder-30B-A3B-Instruct-f32:Q5_K_S", "prompt": "Respond exactly as follows: Write a haiku about autumn leaves falling gently in a quiet forest.", "temperature": 0.7, "top_p": 0.95, "top_k": 20, "min_p": 0.0, "repeat_penalty": 1.1, "stream": false }' | jq -r '.response' ``` 🎯 **Why this works well**: - The prompt is meaningful and achievable for this model size. - Temperature tuned appropriately: lower for factual (`0.5`), higher for creative (`0.7`). - Uses `jq` to extract clean output. ## Verification Check integrity: ```bash sha256sum -c ../SHA256SUMS.txt ``` ## Usage Compatible with: - [LM Studio](https://lmstudio.ai) – local AI model runner - [OpenWebUI](https://openwebui.com) – self-hosted AI interface - [GPT4All](https://gpt4all.io) – private, offline AI chatbot - Directly via \llama.cpp\ ## License Apache 2.0 – see base model for full terms.