Xuandong commited on
Commit
9493b90
·
verified ·
1 Parent(s): 8424ce3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-14B
3
+ license: apache-2.0
4
+ datasets:
5
+ - math
6
+ metrics:
7
+ - accuracy
8
+ pipeline_tag: text-generation
9
+ language:
10
+ - en
11
+ ---
12
+
13
+ # Qwen/Qwen3-14B-GRPO-MATH-1EPOCH
14
+
15
+ **Description:**
16
+
17
+ A GRPO-fine-tuned version of Qwen3-14B trained on the MATH dataset.
18
+
19
+ ---
20
+
21
+ ## Citation
22
+
23
+ ```bibtex
24
+ @article{zhao2025learning,
25
+ title = {Learning to Reason without External Rewards},
26
+ author = {Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
27
+ journal = {arXiv preprint arXiv:2505.19590},
28
+ year = {2025}
29
+ }
30
+ ```