Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,4 @@
|
|
|
|
|
| 1 |
---
|
| 2 |
# 🦄 Model Card
|
| 3 |
base_model: unsloth/Qwen2.5-3B-Instruct
|
|
@@ -22,57 +23,145 @@ language:
|
|
| 22 |
| **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
|
| 23 |
| **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |
|
| 24 |
|
| 25 |
-
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="
|
| 26 |
|
| 27 |
---
|
| 28 |
|
| 29 |
## 🚀 What’s New?
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
- **Auxiliary Rewards**: Style, length, and JSON-validity rewards kept the model on its best behavior.
|
| 38 |
-
3. **2× Faster Training** – Courtesy of Unsloth’s memory-efficient tricks (flash attention + fused optimizers).
|
| 39 |
|
| 40 |
---
|
| 41 |
|
| 42 |
## 🛠️ Intended Use
|
| 43 |
|
| 44 |
-
-
|
| 45 |
-
|
| 46 |
|
| 47 |
---
|
| 48 |
|
| 49 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
---
|
| 61 |
|
| 62 |
-
##
|
| 63 |
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
|
| 69 |
-
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
```bibtex
|
| 78 |
@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
|
|
@@ -80,5 +169,10 @@ If you build something cool with this model, a shout-out would be lovely:
|
|
| 80 |
author = {Bhaviktheslider},
|
| 81 |
year = {2025},
|
| 82 |
howpublished = {Hugging Face},
|
| 83 |
-
note = {https://huggingface.co/bhaviktheslider
|
| 84 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
---
|
| 3 |
# 🦄 Model Card
|
| 4 |
base_model: unsloth/Qwen2.5-3B-Instruct
|
|
|
|
| 23 |
| **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
|
| 24 |
| **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |
|
| 25 |
|
| 26 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="180"/>](https://github.com/unslothai/unsloth)
|
| 27 |
|
| 28 |
---
|
| 29 |
|
| 30 |
## 🚀 What’s New?
|
| 31 |
+
Think of this as the protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**—but now with **3 B parameters, zero SFT, and a reward-only training regime** (GRPO) backed by an LM judge + auxiliary reward functions.
|
| 32 |
|
| 33 |
+
| Upgrade | Explanation |
|
| 34 |
+
|---------|-------------|
|
| 35 |
+
| **Bigger Backbone** | 1.5 B → **3 B** Qwen 2.5 for deeper reasoning headroom. |
|
| 36 |
+
| **Pure RL** | No supervised fine-tuning—policy learned *entirely* from reward signals. |
|
| 37 |
+
| **LM-as-Judge** | Separate LLM scores each candidate for correctness, JSON validity, length & style. |
|
| 38 |
+
| **2× Faster Training** | Courtesy of Unsloth’s memory-savings (flash-attention, fused ops). |
|
|
|
|
|
|
|
| 39 |
|
| 40 |
---
|
| 41 |
|
| 42 |
## 🛠️ Intended Use
|
| 43 |
|
| 44 |
+
Structured-data extraction from messy prose, logs, or transcripts.
|
| 45 |
+
Drop-in upgrade for any pipeline currently using the older 1.5 B DeepSeek-R1 JSON-structurer.
|
| 46 |
|
| 47 |
---
|
| 48 |
|
| 49 |
+
## 🔧 How to Use
|
| 50 |
+
|
| 51 |
+
Below is a minimal example that **re-uses the exact prompt format** from the previous model.
|
| 52 |
+
The model first *reasons* silently (chain-of-thought is kept internal) and then emits only the target JSON.
|
| 53 |
+
|
| 54 |
+
> **Model name** used in the snippet → `bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer`
|
| 55 |
+
> Replace with your actual repo path if different.
|
| 56 |
+
|
| 57 |
+
### 1️⃣ Transformers Quick-Start
|
| 58 |
+
|
| 59 |
+
```python
|
| 60 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
| 61 |
+
import torch, json, textwrap
|
| 62 |
+
|
| 63 |
+
MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit"
|
| 64 |
+
|
| 65 |
+
tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
|
| 66 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 67 |
+
MODEL,
|
| 68 |
+
torch_dtype=torch.float16,
|
| 69 |
+
device_map="auto"
|
| 70 |
+
)
|
| 71 |
+
|
| 72 |
+
# --- Prompt (identical structure to previous model) ---
|
| 73 |
+
system_prompt = (
|
| 74 |
+
"You are an intelligent JSON conversion engine. "
|
| 75 |
+
"Think step-by-step, and then output the final valid JSON."
|
| 76 |
+
)
|
| 77 |
+
|
| 78 |
+
task_prompt = textwrap.dedent("""\
|
| 79 |
+
### Task
|
| 80 |
+
Convert the following unstructured text into the JSON schema shown below.
|
| 81 |
+
Return *only* valid JSON.
|
| 82 |
+
|
| 83 |
+
### Schema
|
| 84 |
+
{
|
| 85 |
+
"name": str,
|
| 86 |
+
"age": int,
|
| 87 |
+
"city": str,
|
| 88 |
+
"skills": [str]
|
| 89 |
+
}
|
| 90 |
+
|
| 91 |
+
### Unstructured text
|
| 92 |
+
John Doe, a 28-year-old software engineer living in Austin, loves Python and Golang.
|
| 93 |
+
""")
|
| 94 |
+
|
| 95 |
+
generator = pipeline(
|
| 96 |
+
"text-generation",
|
| 97 |
+
model=model,
|
| 98 |
+
tokenizer=tok,
|
| 99 |
+
max_new_tokens=256,
|
| 100 |
+
do_sample=False,
|
| 101 |
+
)
|
| 102 |
+
|
| 103 |
+
output = generator(f"<|system|>\n{system_prompt}\n<|user|>\n{task_prompt}")[0]["generated_text"]
|
| 104 |
+
|
| 105 |
+
data = json.loads(output) # ✅ will raise if JSON isn’t valid
|
| 106 |
+
print(data)
|
| 107 |
+
````
|
| 108 |
+
|
| 109 |
+
### 2️⃣ Text-Generation-Inference (TGI)
|
| 110 |
|
| 111 |
+
```bash
|
| 112 |
+
# start server (8-bit, BF16, etc. as needed)
|
| 113 |
+
text-generation-launcher --model-id MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit
|
| 114 |
+
|
| 115 |
+
# curl call
|
| 116 |
+
curl http://localhost:8080/generate \
|
| 117 |
+
-d '{
|
| 118 |
+
"inputs": "<|system|>\nYou are an intelligent JSON conversion engine. Think step-by-step, but ONLY output the final valid JSON.\n<|user|>\n### Task\nConvert the following unstructured text into the JSON schema shown below.\nReturn *only* valid JSON.\n\n### Schema\n{\"title\": str, \"authors\": [str], \"year\": int}\n\n### Unstructured text\n\"Deep Learning\" was written by Ian Goodfellow, Yoshua Bengio, and Aaron Courville in 2016.\n",
|
| 119 |
+
"parameters": {"max_new_tokens": 256, "do_sample": false}
|
| 120 |
+
}'
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
The response will be a pure JSON string, e.g.:
|
| 124 |
+
|
| 125 |
+
```json
|
| 126 |
+
{"title":"Deep Learning","authors":["Ian Goodfellow","Yoshua Bengio","Aaron Courville"],"year":2016}
|
| 127 |
+
```
|
| 128 |
|
| 129 |
---
|
| 130 |
|
| 131 |
+
## 🤖 Why This Prompt Works
|
| 132 |
|
| 133 |
+
1. **System role** instructs the model to plan internally and expose *only* the JSON.
|
| 134 |
+
2. **Schema block** constrains the output keys & types.
|
| 135 |
+
3. **GRPO training** rewarded strict adherence to schema validity and penalized hallucinated keys or malformed JSON.
|
| 136 |
+
4. **LM-as-Judge** provides dense shaping signals: structural accuracy, content fidelity, token length, even stylistic consistency.
|
| 137 |
|
| 138 |
+
The result: *reliable one-shot structuring without post-processing hacks*.
|
| 139 |
|
| 140 |
---
|
| 141 |
|
| 142 |
+
## 🏋️ Training Recipe (Condensed)
|
| 143 |
+
|
| 144 |
+
| Setting | Value |
|
| 145 |
+
| -------------------- | ------------------------------------------------------------------- |
|
| 146 |
+
| **Algorithm** | GRPO (policy ≈ LM; reward LM ≈ `Qwen2.5-7B` w/ JSON validator head) |
|
| 147 |
+
| **Effective Epochs** | 3 |
|
| 148 |
+
| **Batching** | Accum 8, bfloat16 |
|
| 149 |
+
| **Optimizer** | Fused AdamW |
|
| 150 |
+
| **Throughput** | \~45 k tokens/s on 8×A100 |
|
| 151 |
+
|
| 152 |
+
---
|
| 153 |
+
|
| 154 |
+
## 📊 Planned Eval
|
| 155 |
|
| 156 |
+
* **Exact-Match JSON Accuracy**
|
| 157 |
+
* **Structural F1**
|
| 158 |
+
* **Valid-JSON Rate**
|
| 159 |
+
|
| 160 |
+
Benchmarks incoming—watch this space. 🛰️
|
| 161 |
+
|
| 162 |
+
---
|
| 163 |
+
|
| 164 |
+
## 🤝 Citation
|
| 165 |
|
| 166 |
```bibtex
|
| 167 |
@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
|
|
|
|
| 169 |
author = {Bhaviktheslider},
|
| 170 |
year = {2025},
|
| 171 |
howpublished = {Hugging Face},
|
| 172 |
+
note = {\url{https://huggingface.co/bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer}}
|
| 173 |
}
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
*May your JSON always parse and your losses always converge!* 😎
|
| 177 |
+
|
| 178 |
+
```
|