Rename Qwen3-Coder-30B-A3B-Instruct-Q2_K/README.md to Qwen3-Coder-30B-A3B-Instruct-f32-Q2_K/README.md

Browse files

Files changed (2) hide show

Qwen3-Coder-30B-A3B-Instruct-Q2_K/README.md +0 -0
Qwen3-Coder-30B-A3B-Instruct-f32-Q2_K/README.md +110 -0

Qwen3-Coder-30B-A3B-Instruct-Q2_K/README.md DELETED Viewed

Binary file (2.77 kB)

Qwen3-Coder-30B-A3B-Instruct-f32-Q2_K/README.md ADDED Viewed

	@@ -0,0 +1,110 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - qwen3-coder
+  - qwen3-coder-30b-q2
+  - qwen3-coder-30b-q2_k
+  - qwen3-coder-30b-q2_k-gguf
+  - llama.cpp
+  - quantized
+  - text-generation
+  - chat
+  - reasoning
+  - agent
+  - multilingual
+base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
+author: geoffmunn
+---
+# Qwen3-Coder-30B-A3B-Instruct-f16:Q2_K
+Quantized version of [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) at **Q2_K** level, derived from **f32** base weights.
+## Model Info
+- **Format**: GGUF (for llama.cpp and compatible runtimes)
+- **Size**: 11.30 GB
+- **Precision**: Q2_K
+- **Base Model**: [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct)
+- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+## Quality & Performance
+| Metric             | Value                                                 |
+|--------------------|-------------------------------------------------------|
+| **Quality**        | Minimal                                               |
+| **Speed**          | ⚡ Fast                                               |
+| **RAM Required**   | ~20.5 GB                                              |
+| **Recommendation** | Minimal quality; only for extreme memory constraints. |
+## Prompt Template (ChatML)
+This model uses the **ChatML** format used by Qwen:
+```text
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
+## Generation Parameters
+Recommended defaults:
+| Parameter      | Value |
+|----------------|-------|
+| Temperature    | 0.6   |
+| Top-P          | 0.95  |
+| Top-K          | 20    |
+| Min-P          | 0.0   |
+| Repeat Penalty | 1.1   |
+Stop sequences: `<|im_end|>`, `<|im_start|>`
+## 🖥️ CLI Example Using Ollama or TGI Server
+Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server.
+```bash
+curl http://localhost:11434/api/generate -s -N -d '{
+  "model": "hf.co/geoffmunn/Qwen3-Coder-30B-A3B-Instruct-f32:Q2_K",
+  "prompt": "Respond exactly as follows: Summarize what a neural network is in one sentence.",
+  "temperature": 0.3,
+  "top_p": 0.95,
+  "top_k": 20,
+  "min_p": 0.0,
+  "repeat_penalty": 1.1,
+  "stream": false
+}' | jq -r '.response'
+```
+🎯 **Why this works well**:
+- The prompt is meaningful and achievable for this model size.
+- Temperature tuned appropriately: lower for factual (`0.5`), higher for creative (`0.7`).
+- Uses `jq` to extract clean output.
+## Verification
+Check integrity:
+```bash
+sha256sum -c ../SHA256SUMS.txt
+```
+## Usage
+Compatible with:
+- [LM Studio](https://lmstudio.ai) – local AI model runner
+- [OpenWebUI](https://openwebui.com) – self-hosted AI interface
+- [GPT4All](https://gpt4all.io) – private, offline AI chatbot
+- Directly via \llama.cpp\
+## License
+Apache 2.0 – see base model for full terms.