AXCXEPT commited on
Commit
1c558aa
·
verified ·
1 Parent(s): 458244e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -24,6 +24,7 @@ pipeline_tag: text-generation
24
  ## Overview
25
  QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha is a groundbreaking language model built on top of the DeepSeek‑R1‑Distill‑Qwen‑1.5B base. Developed entirely by a solo innovator—with valuable inspiration from Berkeley’s research—the model employs a novel reinforcement learning distillation framework that dramatically enhances performance while keeping training data requirements and compute costs to a minimum. Despite having only 1.5B parameters, the model achieves a striking 47.18 MMLU score and outperforms prior baselines on multiple math and reasoning benchmarks.
26
 
 
27
 
28
  ## Data
29
  Our training dataset is comprised of 6,170 meticulously curated problem–answer pairs drawn from high-quality sources such as:
@@ -34,6 +35,8 @@ Our training dataset is comprised of 6,170 meticulously curated problem–answer
34
 
35
  By focusing on a lean yet highly informative dataset, the model efficiently learns critical reasoning capabilities without the burden of excessive data volume.
36
 
 
 
37
  ## Training Recipe
38
  To maximize performance with minimal resources, QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha utilizes an innovative training strategy that includes:
39
 
@@ -46,6 +49,7 @@ Progressive expansion of the context length is used to boost model performance w
46
 
47
  Training was carried out using <b>H200 GPUs for 336 hours</b> at an exceptionally low cost of approximately <b>$1,341</b>. This carefully engineered approach makes it possible to obtain state-of-the-art performance with very limited training data.
48
 
 
49
 
50
  ## Evaluation
51
  The model has been rigorously evaluated on a variety of challenging benchmarks. Below is a snapshot of the results:
@@ -59,6 +63,8 @@ The model has been rigorously evaluated on a variety of challenging benchmarks.
59
  | **AMC 2023** | 75.00 | 58.92 | 44.17 |
60
  | **MATH 5000** | 38.89 | – | 20,173 |
61
 
 
 
62
  | Model | Base Model | Parameter Num | Training Time | GPU Amount (Cloud GPU) | Training Data量 | MMLU | AIME 2024 | AIME 2025-I | AIME 2025-II | AMC 2023 | MATH 5000 | Average |
63
  |--------------------------------------|----------------------------------------|----------------|----------------|--------------------------|------------------|------|------------|--------------|---------------|-----------|-------------|---------|
64
  | | | | | | | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Path@1 | Path@1 | Cons@64 | Token Count |
@@ -66,3 +72,30 @@ The model has been rigorously evaluated on a variety of challenging benchmarks.
66
  | DeepScaleR-1.5B-Preview | DeepSeek-R1-Distill-Qwen-1.5B | 1.5B | A100*3600H | $6,804 | 40,000 | 36.15 | <strgin>39.37</strong> | <strong>56.67</strong> | 29,888 | <strong>36.77</strong> | <strong>53.33</strong> | 33,151 | 20.00 | 20.00 | 35,293 | <strong>80.00</strong> | <strong>61.96</strong> | <strong>45.87</strong> | <strong>43.33</strong> | 32,777 |
67
  | <strong>QwQ-32B-Distill-Qwen-1.5B-Alpha</strong> | DeepSeek-R1-Distill-Qwen-1.5B| 1.5B | <strong>H200*336H</strong> | <strong>$1,341</strong> | <strong>6,170</strong> | <strong>47.18</strong> | 27.76 | 43.33 | <strong>21,191</strong> | 34.58 | 40.00 | <strong>17,952</strong> | <strong>21.56</strong> | <strong>33.33</strong> | <strong>21,376</strong> | 75.00 | 58.92 | 44.17 | 38.89 | <strong>20,173</strong> |
68
  | Phi-4-mini-instruct | - | 3B | - | - | - | 13.79 | 13.79 | - | 2,055 | 20.00 | 20.00 | 2,001 | 0.00 | 0.00 | 1,970 | - | - | 11.26 | 11.26 | 2,009 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ## Overview
25
  QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha is a groundbreaking language model built on top of the DeepSeek‑R1‑Distill‑Qwen‑1.5B base. Developed entirely by a solo innovator—with valuable inspiration from Berkeley’s research—the model employs a novel reinforcement learning distillation framework that dramatically enhances performance while keeping training data requirements and compute costs to a minimum. Despite having only 1.5B parameters, the model achieves a striking 47.18 MMLU score and outperforms prior baselines on multiple math and reasoning benchmarks.
26
 
27
+ ---
28
 
29
  ## Data
30
  Our training dataset is comprised of 6,170 meticulously curated problem–answer pairs drawn from high-quality sources such as:
 
35
 
36
  By focusing on a lean yet highly informative dataset, the model efficiently learns critical reasoning capabilities without the burden of excessive data volume.
37
 
38
+ ---
39
+
40
  ## Training Recipe
41
  To maximize performance with minimal resources, QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha utilizes an innovative training strategy that includes:
42
 
 
49
 
50
  Training was carried out using <b>H200 GPUs for 336 hours</b> at an exceptionally low cost of approximately <b>$1,341</b>. This carefully engineered approach makes it possible to obtain state-of-the-art performance with very limited training data.
51
 
52
+ ---
53
 
54
  ## Evaluation
55
  The model has been rigorously evaluated on a variety of challenging benchmarks. Below is a snapshot of the results:
 
63
  | **AMC 2023** | 75.00 | 58.92 | 44.17 |
64
  | **MATH 5000** | 38.89 | – | 20,173 |
65
 
66
+
67
+ ### Comparison
68
  | Model | Base Model | Parameter Num | Training Time | GPU Amount (Cloud GPU) | Training Data量 | MMLU | AIME 2024 | AIME 2025-I | AIME 2025-II | AMC 2023 | MATH 5000 | Average |
69
  |--------------------------------------|----------------------------------------|----------------|----------------|--------------------------|------------------|------|------------|--------------|---------------|-----------|-------------|---------|
70
  | | | | | | | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Cons@64 | Avg. Token Count | Path@1 | Path@1 | Path@1 | Cons@64 | Token Count |
 
72
  | DeepScaleR-1.5B-Preview | DeepSeek-R1-Distill-Qwen-1.5B | 1.5B | A100*3600H | $6,804 | 40,000 | 36.15 | <strgin>39.37</strong> | <strong>56.67</strong> | 29,888 | <strong>36.77</strong> | <strong>53.33</strong> | 33,151 | 20.00 | 20.00 | 35,293 | <strong>80.00</strong> | <strong>61.96</strong> | <strong>45.87</strong> | <strong>43.33</strong> | 32,777 |
73
  | <strong>QwQ-32B-Distill-Qwen-1.5B-Alpha</strong> | DeepSeek-R1-Distill-Qwen-1.5B| 1.5B | <strong>H200*336H</strong> | <strong>$1,341</strong> | <strong>6,170</strong> | <strong>47.18</strong> | 27.76 | 43.33 | <strong>21,191</strong> | 34.58 | 40.00 | <strong>17,952</strong> | <strong>21.56</strong> | <strong>33.33</strong> | <strong>21,376</strong> | 75.00 | 58.92 | 44.17 | 38.89 | <strong>20,173</strong> |
74
  | Phi-4-mini-instruct | - | 3B | - | - | - | 13.79 | 13.79 | - | 2,055 | 20.00 | 20.00 | 2,001 | 0.00 | 0.00 | 1,970 | - | - | 11.26 | 11.26 | 2,009 |
75
+
76
+ ---
77
+
78
+ ## Serving QwQ‑32B‑Distill‑Qwen‑1.5B‑Alpha
79
+
80
+ Deploy your model effortlessly using high-performance inference systems, including:
81
+
82
+ - **vLLM**
83
+ - **Hugging Face Text Generation Inference (TGI)**
84
+ - **SGLang**
85
+ - **TensorRT-LLM**
86
+
87
+ All these systems support the OpenAI Chat Completions API format, ensuring smooth integration into your applications.
88
+
89
+
90
+ ---
91
+
92
+ ## License
93
+
94
+ This project is released under the **MIT License**, reflecting our commitment to open and accessible AI. We firmly believe that cutting-edge AI research should be available for anyone to use, modify, and build upon.
95
+
96
+
97
+ ---
98
+
99
+ ## Acknowledgements
100
+
101
+ A huge thank you is due to the open-source community and the team behind **DeepSeek‑R1‑Distill‑Qwen‑1.5B**, whose groundwork made this innovation possible. This model is a testament to what can be achieved with passion, creativity, and a singular vision—developed entirely by a solo researcher while drawing inspiration from Berkeley’s forward-thinking research ecosystem.