Update README.md
Browse files
README.md
CHANGED
|
@@ -23,141 +23,179 @@ language:
|
|
| 23 |
| **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
|
| 24 |
| **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) Γ Hugging Face TRL |
|
| 25 |
|
| 26 |
-
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="
|
| 27 |
|
| 28 |
---
|
| 29 |
|
| 30 |
## π Whatβs New?
|
| 31 |
-
|
| 32 |
|
| 33 |
-
| Upgrade
|
| 34 |
-
|
| 35 |
-
| **Bigger Backbone
|
| 36 |
-
| **Pure RL**
|
| 37 |
-
| **LM-as-Judge**
|
| 38 |
-
| **2Γ Faster
|
| 39 |
|
| 40 |
---
|
| 41 |
|
| 42 |
## π οΈ Intended Use
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
Drop-in upgrade for any pipeline currently using the older 1.5 B DeepSeek-R1 JSON-structurer.
|
| 46 |
|
| 47 |
---
|
| 48 |
|
| 49 |
-
## π§ How to Use
|
| 50 |
-
|
| 51 |
-
Below is a minimal example that **re-uses the exact prompt format** from the previous model.
|
| 52 |
-
The model first *reasons* silently (chain-of-thought is kept internal) and then emits only the target JSON.
|
| 53 |
-
|
| 54 |
-
> **Model name** used in the snippet β `bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer`
|
| 55 |
-
> Replace with your actual repo path if different.
|
| 56 |
|
| 57 |
-
|
|
|
|
|
|
|
| 58 |
|
| 59 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
| 61 |
-
import torch, json, textwrap
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
-
|
|
|
|
|
|
|
| 64 |
|
|
|
|
| 65 |
tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
|
| 66 |
model = AutoModelForCausalLM.from_pretrained(
|
| 67 |
MODEL,
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
)
|
| 71 |
-
|
| 72 |
-
# --- Prompt (identical structure to previous model) ---
|
| 73 |
-
system_prompt = (
|
| 74 |
-
"You are an intelligent JSON conversion engine. "
|
| 75 |
-
"Think step-by-step, and then output the final valid JSON."
|
| 76 |
)
|
|
|
|
|
|
|
| 77 |
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
Convert the following unstructured text into the JSON schema shown below.
|
| 81 |
-
Return *only* valid JSON.
|
| 82 |
-
|
| 83 |
-
### Schema
|
| 84 |
-
{
|
| 85 |
-
"name": str,
|
| 86 |
-
"age": int,
|
| 87 |
-
"city": str,
|
| 88 |
-
"skills": [str]
|
| 89 |
-
}
|
| 90 |
-
|
| 91 |
-
### Unstructured text
|
| 92 |
-
John Doe, a 28-year-old software engineer living in Austin, loves Python and Golang.
|
| 93 |
-
""")
|
| 94 |
-
|
| 95 |
-
generator = pipeline(
|
| 96 |
-
"text-generation",
|
| 97 |
-
model=model,
|
| 98 |
-
tokenizer=tok,
|
| 99 |
-
max_new_tokens=256,
|
| 100 |
-
do_sample=False,
|
| 101 |
-
)
|
| 102 |
|
| 103 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
-
|
| 106 |
-
print(data)
|
| 107 |
````
|
| 108 |
|
| 109 |
-
###
|
| 110 |
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
# curl call
|
| 116 |
-
curl http://localhost:8080/generate \
|
| 117 |
-
-d '{
|
| 118 |
-
"inputs": "<|system|>\nYou are an intelligent JSON conversion engine. Think step-by-step, but ONLY output the final valid JSON.\n<|user|>\n### Task\nConvert the following unstructured text into the JSON schema shown below.\nReturn *only* valid JSON.\n\n### Schema\n{\"title\": str, \"authors\": [str], \"year\": int}\n\n### Unstructured text\n\"Deep Learning\" was written by Ian Goodfellow, Yoshua Bengio, and Aaron Courville in 2016.\n",
|
| 119 |
-
"parameters": {"max_new_tokens": 256, "do_sample": false}
|
| 120 |
-
}'
|
| 121 |
-
```
|
| 122 |
-
|
| 123 |
-
The response will be a pure JSON string, e.g.:
|
| 124 |
-
|
| 125 |
-
```json
|
| 126 |
-
{"title":"Deep Learning","authors":["Ian Goodfellow","Yoshua Bengio","Aaron Courville"],"year":2016}
|
| 127 |
-
```
|
| 128 |
-
|
| 129 |
-
---
|
| 130 |
-
|
| 131 |
-
## π€ Why This Prompt Works
|
| 132 |
-
|
| 133 |
-
1. **System role** instructs the model to plan internally and expose *only* the JSON.
|
| 134 |
-
2. **Schema block** constrains the output keys & types.
|
| 135 |
-
3. **GRPO training** rewarded strict adherence to schema validity and penalized hallucinated keys or malformed JSON.
|
| 136 |
-
4. **LM-as-Judge** provides dense shaping signals: structural accuracy, content fidelity, token length, even stylistic consistency.
|
| 137 |
-
|
| 138 |
-
The result: *reliable one-shot structuring without post-processing hacks*.
|
| 139 |
|
| 140 |
---
|
| 141 |
|
| 142 |
## ποΈ Training Recipe (Condensed)
|
| 143 |
|
| 144 |
-
| Setting
|
| 145 |
-
|
|
| 146 |
-
| **Algorithm**
|
| 147 |
-
| **
|
| 148 |
-
| **
|
| 149 |
-
| **Optimizer**
|
| 150 |
-
| **Throughput**
|
| 151 |
|
| 152 |
---
|
| 153 |
|
| 154 |
-
## π
|
| 155 |
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
|
|
|
|
|
|
| 159 |
|
| 160 |
-
|
| 161 |
|
| 162 |
---
|
| 163 |
|
|
@@ -165,14 +203,14 @@ Benchmarks incomingβwatch this space. π°οΈ
|
|
| 165 |
|
| 166 |
```bibtex
|
| 167 |
@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
|
| 168 |
-
title = {An Unsloth-accelerated GRPO-trained Qwen 2.5
|
| 169 |
author = {Bhaviktheslider},
|
| 170 |
year = {2025},
|
| 171 |
-
howpublished = {
|
| 172 |
-
note = {\url{https://huggingface.co/bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer}}
|
| 173 |
}
|
| 174 |
```
|
| 175 |
|
| 176 |
*May your JSON always parse and your losses always converge!* π
|
| 177 |
|
| 178 |
```
|
|
|
|
|
|
| 23 |
| **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
|
| 24 |
| **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) Γ Hugging Face TRL |
|
| 25 |
|
| 26 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="190"/>](https://github.com/unslothai/unsloth)
|
| 27 |
|
| 28 |
---
|
| 29 |
|
| 30 |
## π Whatβs New?
|
| 31 |
+
> *The protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**βnow with more neurons, zero SFT, and a league of reward functions.*
|
| 32 |
|
| 33 |
+
| Upgrade | Explanation |
|
| 34 |
+
|--------------------|------------------------------------------------------------------------------|
|
| 35 |
+
| **Bigger Backbone**| 1.5 B β **3 B** Qwen 2.5 for bigger reasoning muscles. |
|
| 36 |
+
| **Pure RL** | No supervised fine-tuningβpolicy learned *only* from reward signals (GRPO). |
|
| 37 |
+
| **LM-as-Judge** | Separate LLM rates each candidate for correctness, JSON validity, style⦠|
|
| 38 |
+
| **2Γ Faster Train**| Unslothβs flash-attention & fused ops = less VRAM, more speed. |
|
| 39 |
|
| 40 |
---
|
| 41 |
|
| 42 |
## π οΈ Intended Use
|
| 43 |
+
* Convert messy prose, logs, or audit notes into a pristine JSON document that follows a complex, nested schema.
|
| 44 |
+
* Drop-in replacement for any pipeline using the older DeepSeek-R1 1.5 B structurerβjust swap the checkpoint and enjoy the headroom.
|
|
|
|
| 45 |
|
| 46 |
---
|
| 47 |
|
| 48 |
+
## π§ How to Use (Reasoning + JSON)
|
| 49 |
+
The snippet below:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
+
1. **Primes** the model with the *exact* Pydantic schema, so it outputs the right keys.
|
| 52 |
+
2. Makes the model **think step-by-step** (reasoning) but still wraps the final JSON in an easy-to-parse container.
|
| 53 |
+
3. Uses the correct repo name: `MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora`.
|
| 54 |
|
| 55 |
```python
|
| 56 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 57 |
+
# QUICK-START
|
| 58 |
+
# Structured-data extraction with reasoning + JSON output
|
| 59 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 60 |
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
| 61 |
+
import torch, json, textwrap, inspect
|
| 62 |
+
from pydantic import BaseModel
|
| 63 |
+
from typing import List, Optional
|
| 64 |
+
|
| 65 |
+
MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora"
|
| 66 |
+
|
| 67 |
+
# 1οΈβ£ Inline schema (keeps the LLM on-rails) βββββββββββββββββββββββββββββββββ
|
| 68 |
+
class MultipleChoice(BaseModel):
|
| 69 |
+
question: str
|
| 70 |
+
options: List[str]
|
| 71 |
+
selected: str
|
| 72 |
+
|
| 73 |
+
class FormField(BaseModel):
|
| 74 |
+
fieldName: str
|
| 75 |
+
value: str
|
| 76 |
+
notes: Optional[str] = ""
|
| 77 |
+
|
| 78 |
+
class Calculation(BaseModel):
|
| 79 |
+
formula: str
|
| 80 |
+
result: str
|
| 81 |
+
notes: Optional[str] = ""
|
| 82 |
+
|
| 83 |
+
class Metadata(BaseModel):
|
| 84 |
+
reportDate: str
|
| 85 |
+
auditorId: Optional[str] = None
|
| 86 |
+
comments: Optional[str] = None
|
| 87 |
+
|
| 88 |
+
class Content(BaseModel):
|
| 89 |
+
paragraphs: List[str]
|
| 90 |
+
tables: List["Table"] # assume Table defined elsewhere
|
| 91 |
+
checkboxes: List["Checkbox"] # γ
|
| 92 |
+
multipleChoice: List[MultipleChoice]
|
| 93 |
+
formFields: List[FormField]
|
| 94 |
+
calculations: List[Calculation]
|
| 95 |
+
metadata: Optional[Metadata] = Metadata(reportDate="")
|
| 96 |
+
|
| 97 |
+
class Section(BaseModel):
|
| 98 |
+
id: str
|
| 99 |
+
title: str
|
| 100 |
+
content: Content
|
| 101 |
+
|
| 102 |
+
class Document(BaseModel):
|
| 103 |
+
documentTitle: str
|
| 104 |
+
documentDate: str
|
| 105 |
+
sections: List[Section]
|
| 106 |
+
|
| 107 |
+
SCHEMA_TEXT = inspect.getsource(Document)
|
| 108 |
+
|
| 109 |
+
# 2οΈβ£ Build prompts ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 110 |
+
SYSTEM_PROMPT = textwrap.dedent(f"""
|
| 111 |
+
You are an expert **data-extraction assistant**.
|
| 112 |
+
Extract structured info from unstructured text **exactly** following the Pydantic schema.
|
| 113 |
+
|
| 114 |
+
ββ Schema ββ
|
| 115 |
+
{SCHEMA_TEXT}
|
| 116 |
+
βββββββββββββ
|
| 117 |
+
|
| 118 |
+
Rules:
|
| 119 |
+
1. Follow the schema for keys & nesting.
|
| 120 |
+
2. Copy values verbatim when possible.
|
| 121 |
+
3. If a field is missing, return null.
|
| 122 |
+
4. Output your step-by-step reasoning first.
|
| 123 |
+
5. Then return ONLY the JSON inside this wrapper:
|
| 124 |
+
final answer[ json object: {{ ... }} ]
|
| 125 |
+
|
| 126 |
+
Format:
|
| 127 |
+
<reasoning>β¦</reasoning>
|
| 128 |
+
<answer>
|
| 129 |
+
final answer[ json object: {{ β¦ }} ]
|
| 130 |
+
</answer>
|
| 131 |
+
""").strip()
|
| 132 |
+
|
| 133 |
+
UNSTRUCTURED_TEXT = """
|
| 134 |
+
12 April 2025 β Onsite audit performed by Jane Smith.
|
| 135 |
+
Observations: Two fire extinguishers past expiry; emergency lights functional.
|
| 136 |
+
Calculations: Total extinguishers = 8, expired = 2 β 25 % overdue.
|
| 137 |
+
"""
|
| 138 |
+
|
| 139 |
+
USER_PROMPT = textwrap.dedent(f"""
|
| 140 |
+
### Task
|
| 141 |
+
Convert the following *hier* text to the schema.
|
| 142 |
|
| 143 |
+
### hier
|
| 144 |
+
{UNSTRUCTURED_TEXT}
|
| 145 |
+
""").strip()
|
| 146 |
|
| 147 |
+
# 3οΈβ£ Generate βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 148 |
tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
|
| 149 |
model = AutoModelForCausalLM.from_pretrained(
|
| 150 |
MODEL,
|
| 151 |
+
device_map="auto",
|
| 152 |
+
torch_dtype=torch.bfloat16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
)
|
| 154 |
+
gen = pipeline("text-generation", model=model, tokenizer=tok,
|
| 155 |
+
max_new_tokens=512, do_sample=False)
|
| 156 |
|
| 157 |
+
prompt = f"<|system|>\n{SYSTEM_PROMPT}\n<|user|>\n{USER_PROMPT}"
|
| 158 |
+
raw_out = gen(prompt)[0]["generated_text"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
|
| 160 |
+
# 4οΈβ£ Slice out the JSON βββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 161 |
+
start = raw_out.find("final answer[")
|
| 162 |
+
end = raw_out.rfind("]") + 1
|
| 163 |
+
json_text = raw_out[start:].split("json object:")[-1].strip(" []\n")
|
| 164 |
+
data = json.loads(json_text) # β
Raises if malformed
|
| 165 |
|
| 166 |
+
print(raw_out) # reasoning + JSON
|
| 167 |
+
print("\nβ
Parsed object:\n", data)
|
| 168 |
````
|
| 169 |
|
| 170 |
+
### Why it Works π§
|
| 171 |
|
| 172 |
+
* **Schema-priming** ensures key-level fidelityβno βcreativeβ field names.
|
| 173 |
+
* **Chain-of-thought** improves factual extraction (was rewarded during GRPO).
|
| 174 |
+
* The `final answer[β¦]` wrapper makes downstream parsing a one-liner.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 175 |
|
| 176 |
---
|
| 177 |
|
| 178 |
## ποΈ Training Recipe (Condensed)
|
| 179 |
|
| 180 |
+
| Setting | Value |
|
| 181 |
+
| -------------- | ------------------------------------------------------------------- |
|
| 182 |
+
| **Algorithm** | GRPO β policy β LM, reward LM β `Qwen2.5-7B` w/ JSON-validator head |
|
| 183 |
+
| **Epochs** | 3 (effective) |
|
| 184 |
+
| **Batch** | Grad-accum 8, bfloat16 |
|
| 185 |
+
| **Optimizer** | Fused AdamW |
|
| 186 |
+
| **Throughput** | β 45 k tokens/s on 8ΓA100 |
|
| 187 |
|
| 188 |
---
|
| 189 |
|
| 190 |
+
## π Evaluation (WIP)
|
| 191 |
|
| 192 |
+
| Metric | Status |
|
| 193 |
+
| ------------------------- | ------ |
|
| 194 |
+
| Exact-Match JSON Accuracy | π |
|
| 195 |
+
| Structural F1 | π |
|
| 196 |
+
| Valid-JSON Rate | π |
|
| 197 |
|
| 198 |
+
Stay tunedβnumbers landing faster than you can say βschema validation.β π°οΈ
|
| 199 |
|
| 200 |
---
|
| 201 |
|
|
|
|
| 203 |
|
| 204 |
```bibtex
|
| 205 |
@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
|
| 206 |
+
title = {An Unsloth-accelerated GRPO-trained Qwen 2.5-3B for JSON structuring},
|
| 207 |
author = {Bhaviktheslider},
|
| 208 |
year = {2025},
|
| 209 |
+
howpublished = {\url{https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora}}
|
|
|
|
| 210 |
}
|
| 211 |
```
|
| 212 |
|
| 213 |
*May your JSON always parse and your losses always converge!* π
|
| 214 |
|
| 215 |
```
|
| 216 |
+
|