--- base_model: Qwen/Qwen3-14B license: apache-2.0 datasets: - math metrics: - accuracy pipeline_tag: text-generation language: - en --- # Qwen/Qwen3-14B-GRPO-MATH-1EPOCH **Description:** A GRPO-fine-tuned version of Qwen3-14B trained on the MATH dataset. --- ## Citation ```bibtex @article{zhao2025learning, title = {Learning to Reason without External Rewards}, author = {Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn}, journal = {arXiv preprint arXiv:2505.19590}, year = {2025} } ```