File size: 8,085 Bytes
bbcb6f3 700fe8c bbcb6f3 700fe8c 7e45e64 9d27549 700fe8c 4bf62f8 0241243 c64a30a 2854161 c64a30a 700fe8c 1c558aa 700fe8c dbf148d 700fe8c e1a7891 1c558aa 700fe8c 458244e 700fe8c 458244e 700fe8c 458244e 700fe8c bbcb6f3 2854161 bbcb6f3 1c558aa bbcb6f3 dbf148d 0719cfc dbf148d 1c558aa 14e54dc 1c558aa 0ddbf89 cf270dd 0ddbf89 580d058 0ddbf89 cf270dd 4b76eed cf270dd 7e45e64 4b76eed cf270dd 1c558aa 402b991 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
---
license: mit
library_name: transformers
datasets:
- AI-MO/NuminaMath-CoT
- KbsdJames/Omni-MATH
- RUC-AIBOX/STILL-3-Preview-RL-Data
- hendrycks/competition_math
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
pipeline_tag: text-generation
tags:
- vllm
- qwen
- qwq
- deepseek
---

<div align="center">
<span style="font-family: default; font-size: 1.5em;">QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha</span>
<div>
- Solo Innovation: Breaking Performance Barriers with Minimal Resources -
<div><b>Powered by personal research with insights from agentica-org</b></div>
</div>
</div>
## Overview
QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha is a groundbreaking language model built on top of the DeepSeek‑R1‑Distill‑Qwen‑1.5B base. Developed entirely by a solo innovator—with valuable inspiration from Berkeley’s research—the model employs a novel reinforcement learning distillation framework that dramatically enhances performance while keeping training data requirements and compute costs to a minimum. Despite having only 1.5B parameters, the model achieves a striking 47.18 MMLU score and outperforms prior baselines on multiple math and reasoning benchmarks.
---
## Data
Our training dataset is comprised of 6,170 meticulously curated problem–answer pairs drawn from high-quality sources such as:
- AIME Problems(QwQ-32B Generated)
- AMC Problems(QwQ-32B Generated)
- MMLU Problems(QwQ-32B Generated))
- Complementary academic math and reasoning datasets(QwQ-32B Generated)
By focusing on a lean yet highly informative dataset, the model efficiently learns critical reasoning capabilities without the burden of excessive data volume.
Generate in QwQ32B with reference to each dataset in the model definition and other datasets.
---
## Training Recipe
To maximize performance with minimal resources, QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha utilizes an innovative training strategy that includes:
<b>- Scaled Group Relative Policy Optimization (GRPO):</b>
An adaptation of PPO that normalizes the advantage function across samples generated from the same prompt.
<b>- KL Divergence Regularization::</b>
Additional regularization is applied on top of the surrogate loss to prevent significant policy drift.
<b>- Iterative Context Scaling::</b>
Progressive expansion of the context length is used to boost model performance while reducing compute costs.
Training was carried out using <b>H200 GPUs for 336 hours</b> at an exceptionally low cost of approximately <b>$1,341</b>. This carefully engineered approach makes it possible to obtain state-of-the-art performance with very limited training data.
---
## Evaluation
The model has been rigorously evaluated on a variety of challenging benchmarks. Below is a snapshot of the results:
| **Benchmark** | **Metric (Path@1)** | **Metric (cons@64)** | **Avg. Token Count** |
|------------------|---------------------|----------------------|----------------------|
| **MMLU** | 47.18 | – | – |
| **AIME 2024** | 33.33 | 53.33 | 21,191 |
| **AIME 2025-I** | 34.58 | 40.00 | 17,952 |
| **AIME 2025-II** | 21.56 | 33.33 | 21,376 |
| **AMC 2023** | 75.00 | 58.92 | 44.17 |
| **MATH 5000** | 38.89 | – | 20,173 |
---
## Comparison

## Serving QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha
Deploy your model effortlessly using high-performance inference systems, including:
- **vLLM**
- **Hugging Face Text Generation Inference (TGI)**
- **SGLang**
- **TensorRT-LLM**
All these systems support the OpenAI Chat Completions API format, ensuring smooth integration into your applications.
---
## How to use:
<b>Runs on a single A40 GPU!</b>
### Serving Model:
```shell
vllm serve AXCXEPT/QwQ-32B-Distill-Qwen-1.5B-Alpha --max-model-len 32768 --enforce-eager
```
### Call API Without Streaming:
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="token-abc123",
)
prompt = """Every morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+rac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop."""
completion = client.chat.completions.create(
model="AXCXEPT/QwQ-32B-Distill-Qwen-1.5B-Alpha",
messages=[
{"role": "user", "content": prompt}
]
)
print(completion.choices[0].message)
```
### Call API With Streaming:
```python
from openai import OpenAI
#Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
prompt = """Every morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+rac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop."""
messages = [{"role": "user", "content": prompt}]
#For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
stream = client.chat.completions.create(model=model,
messages=messages,
stream=True)
print("client: Start streaming chat completions...")
printed_reasoning_content = False
printed_content = False
for chunk in stream:
reasoning_content = None
content = None
# Check the content is reasoning_content or content
if hasattr(chunk.choices[0].delta, "reasoning_content"):
reasoning_content = chunk.choices[0].delta.reasoning_content
elif hasattr(chunk.choices[0].delta, "content"):
content = chunk.choices[0].delta.content
if reasoning_content is not None:
if not printed_reasoning_content:
printed_reasoning_content = True
print("reasoning_content:", end="", flush=True)
print(reasoning_content, end="", flush=True)
elif content is not None:
if not printed_content:
printed_content = True
print("\ncontent:", end="", flush=True)
# Extract and print the content
print(content, end="", flush=True)
```
## License
This project is released under the **MIT License**, reflecting our commitment to open and accessible AI. We firmly believe that cutting-edge AI research should be available for anyone to use, modify, and build upon.
---
## Special Thanks
We extend our sincere gratitude to the following teams and organizations whose contributions and ideas were instrumental in this project:
- Qwen Team (Alibaba Cloud): for creating the exceptional QwQ-32B model used as the distillation source.
- Agentica-org (Berkeley Sky Computing Lab and Berkeley AI Research): for valuable insights and pioneering reinforcement learning techniques.
- DeepSeek AI: for developing the robust foundational model upon which this research is built.
Their groundbreaking work made our innovations possible. |