sunblaze-ucb
/

Qwen3-14B-GRPO-MATH-1EPOCH

Text Generation

reinforcement-learning

text-generation-inference

Model card Files Files and versions

Xuandong commited on Jun 16

Commit

9493b90

·

verified ·

1 Parent(s): 8424ce3

Create README.md

Files changed (1) hide show

README.md +30 -0

README.md ADDED Viewed

	@@ -0,0 +1,30 @@

+---
+base_model: Qwen/Qwen3-14B
+license: apache-2.0
+datasets:
+  - math
+metrics:
+  - accuracy
+pipeline_tag: text-generation
+language:
+  - en
+---
+# Qwen/Qwen3-14B-GRPO-MATH-1EPOCH
+**Description:**
+A GRPO-fine-tuned version of Qwen3-14B trained on the MATH dataset.
+---
+## Citation
+```bibtex
+@article{zhao2025learning,
+  title   = {Learning to Reason without External Rewards},
+  author  = {Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
+  journal = {arXiv preprint arXiv:2505.19590},
+  year    = {2025}
+}
+```