Update README.md
Browse files
README.md
CHANGED
|
@@ -24,6 +24,7 @@ pipeline_tag: text-generation
|
|
| 24 |
## Overview
|
| 25 |
QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha is a groundbreaking language model built on top of the DeepSeek‑R1‑Distill‑Qwen‑1.5B base. Developed entirely by a solo innovator—with valuable inspiration from Berkeley’s research—the model employs a novel reinforcement learning distillation framework that dramatically enhances performance while keeping training data requirements and compute costs to a minimum. Despite having only 1.5B parameters, the model achieves a striking 47.18 MMLU score and outperforms prior baselines on multiple math and reasoning benchmarks.
|
| 26 |
|
|
|
|
| 27 |
|
| 28 |
## Data
|
| 29 |
Our training dataset is comprised of 6,170 meticulously curated problem–answer pairs drawn from high-quality sources such as:
|
|
@@ -34,6 +35,8 @@ Our training dataset is comprised of 6,170 meticulously curated problem–answer
|
|
| 34 |
|
| 35 |
By focusing on a lean yet highly informative dataset, the model efficiently learns critical reasoning capabilities without the burden of excessive data volume.
|
| 36 |
|
|
|
|
|
|
|
| 37 |
## Training Recipe
|
| 38 |
To maximize performance with minimal resources, QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha utilizes an innovative training strategy that includes:
|
| 39 |
|
|
@@ -46,6 +49,7 @@ Progressive expansion of the context length is used to boost model performance w
|
|
| 46 |
|
| 47 |
Training was carried out using <b>H200 GPUs for 336 hours</b> at an exceptionally low cost of approximately <b>$1,341</b>. This carefully engineered approach makes it possible to obtain state-of-the-art performance with very limited training data.
|
| 48 |
|
|
|
|
| 49 |
|
| 50 |
## Evaluation
|
| 51 |
The model has been rigorously evaluated on a variety of challenging benchmarks. Below is a snapshot of the results:
|
|
@@ -59,6 +63,8 @@ The model has been rigorously evaluated on a variety of challenging benchmarks.
|
|
| 59 |
| **AMC 2023** | 75.00 | 58.92 | 44.17 |
|
| 60 |
| **MATH 5000** | 38.89 | – | 20,173 |
|
| 61 |
|
|
|
|
|
|
|
| 62 |
| Model | Base Model | Parameter Num | Training Time | GPU Amount (Cloud GPU) | Training Data量 | MMLU | AIME 2024 | AIME 2025-I | AIME 2025-II | AMC 2023 | MATH 5000 | Average |
|
| 63 |
|--------------------------------------|----------------------------------------|----------------|----------------|--------------------------|------------------|------|------------|--------------|---------------|-----------|-------------|---------|
|
| 64 |
| | | | | | | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Path@1 | Path@1 | Cons@64 | Token Count |
|
|
@@ -66,3 +72,30 @@ The model has been rigorously evaluated on a variety of challenging benchmarks.
|
|
| 66 |
| DeepScaleR-1.5B-Preview | DeepSeek-R1-Distill-Qwen-1.5B | 1.5B | A100*3600H | $6,804 | 40,000 | 36.15 | <strgin>39.37</strong> | <strong>56.67</strong> | 29,888 | <strong>36.77</strong> | <strong>53.33</strong> | 33,151 | 20.00 | 20.00 | 35,293 | <strong>80.00</strong> | <strong>61.96</strong> | <strong>45.87</strong> | <strong>43.33</strong> | 32,777 |
|
| 67 |
| <strong>QwQ-32B-Distill-Qwen-1.5B-Alpha</strong> | DeepSeek-R1-Distill-Qwen-1.5B| 1.5B | <strong>H200*336H</strong> | <strong>$1,341</strong> | <strong>6,170</strong> | <strong>47.18</strong> | 27.76 | 43.33 | <strong>21,191</strong> | 34.58 | 40.00 | <strong>17,952</strong> | <strong>21.56</strong> | <strong>33.33</strong> | <strong>21,376</strong> | 75.00 | 58.92 | 44.17 | 38.89 | <strong>20,173</strong> |
|
| 68 |
| Phi-4-mini-instruct | - | 3B | - | - | - | 13.79 | 13.79 | - | 2,055 | 20.00 | 20.00 | 2,001 | 0.00 | 0.00 | 1,970 | - | - | 11.26 | 11.26 | 2,009 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
## Overview
|
| 25 |
QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha is a groundbreaking language model built on top of the DeepSeek‑R1‑Distill‑Qwen‑1.5B base. Developed entirely by a solo innovator—with valuable inspiration from Berkeley’s research—the model employs a novel reinforcement learning distillation framework that dramatically enhances performance while keeping training data requirements and compute costs to a minimum. Despite having only 1.5B parameters, the model achieves a striking 47.18 MMLU score and outperforms prior baselines on multiple math and reasoning benchmarks.
|
| 26 |
|
| 27 |
+
---
|
| 28 |
|
| 29 |
## Data
|
| 30 |
Our training dataset is comprised of 6,170 meticulously curated problem–answer pairs drawn from high-quality sources such as:
|
|
|
|
| 35 |
|
| 36 |
By focusing on a lean yet highly informative dataset, the model efficiently learns critical reasoning capabilities without the burden of excessive data volume.
|
| 37 |
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
## Training Recipe
|
| 41 |
To maximize performance with minimal resources, QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha utilizes an innovative training strategy that includes:
|
| 42 |
|
|
|
|
| 49 |
|
| 50 |
Training was carried out using <b>H200 GPUs for 336 hours</b> at an exceptionally low cost of approximately <b>$1,341</b>. This carefully engineered approach makes it possible to obtain state-of-the-art performance with very limited training data.
|
| 51 |
|
| 52 |
+
---
|
| 53 |
|
| 54 |
## Evaluation
|
| 55 |
The model has been rigorously evaluated on a variety of challenging benchmarks. Below is a snapshot of the results:
|
|
|
|
| 63 |
| **AMC 2023** | 75.00 | 58.92 | 44.17 |
|
| 64 |
| **MATH 5000** | 38.89 | – | 20,173 |
|
| 65 |
|
| 66 |
+
|
| 67 |
+
### Comparison
|
| 68 |
| Model | Base Model | Parameter Num | Training Time | GPU Amount (Cloud GPU) | Training Data量 | MMLU | AIME 2024 | AIME 2025-I | AIME 2025-II | AMC 2023 | MATH 5000 | Average |
|
| 69 |
|--------------------------------------|----------------------------------------|----------------|----------------|--------------------------|------------------|------|------------|--------------|---------------|-----------|-------------|---------|
|
| 70 |
| | | | | | | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Path@1 | Path@1 | Cons@64 | Token Count |
|
|
|
|
| 72 |
| DeepScaleR-1.5B-Preview | DeepSeek-R1-Distill-Qwen-1.5B | 1.5B | A100*3600H | $6,804 | 40,000 | 36.15 | <strgin>39.37</strong> | <strong>56.67</strong> | 29,888 | <strong>36.77</strong> | <strong>53.33</strong> | 33,151 | 20.00 | 20.00 | 35,293 | <strong>80.00</strong> | <strong>61.96</strong> | <strong>45.87</strong> | <strong>43.33</strong> | 32,777 |
|
| 73 |
| <strong>QwQ-32B-Distill-Qwen-1.5B-Alpha</strong> | DeepSeek-R1-Distill-Qwen-1.5B| 1.5B | <strong>H200*336H</strong> | <strong>$1,341</strong> | <strong>6,170</strong> | <strong>47.18</strong> | 27.76 | 43.33 | <strong>21,191</strong> | 34.58 | 40.00 | <strong>17,952</strong> | <strong>21.56</strong> | <strong>33.33</strong> | <strong>21,376</strong> | 75.00 | 58.92 | 44.17 | 38.89 | <strong>20,173</strong> |
|
| 74 |
| Phi-4-mini-instruct | - | 3B | - | - | - | 13.79 | 13.79 | - | 2,055 | 20.00 | 20.00 | 2,001 | 0.00 | 0.00 | 1,970 | - | - | 11.26 | 11.26 | 2,009 |
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## Serving QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha
|
| 79 |
+
|
| 80 |
+
Deploy your model effortlessly using high-performance inference systems, including:
|
| 81 |
+
|
| 82 |
+
- **vLLM**
|
| 83 |
+
- **Hugging Face Text Generation Inference (TGI)**
|
| 84 |
+
- **SGLang**
|
| 85 |
+
- **TensorRT-LLM**
|
| 86 |
+
|
| 87 |
+
All these systems support the OpenAI Chat Completions API format, ensuring smooth integration into your applications.
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
+
## License
|
| 93 |
+
|
| 94 |
+
This project is released under the **MIT License**, reflecting our commitment to open and accessible AI. We firmly believe that cutting-edge AI research should be available for anyone to use, modify, and build upon.
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
---
|
| 98 |
+
|
| 99 |
+
## Acknowledgements
|
| 100 |
+
|
| 101 |
+
A huge thank you is due to the open-source community and the team behind **DeepSeek‑R1‑Distill‑Qwen‑1.5B**, whose groundwork made this innovation possible. This model is a testament to what can be achieved with passion, creativity, and a singular vision—developed entirely by a solo researcher while drawing inspiration from Berkeley’s forward-thinking research ecosystem.
|