---
base_model: Qwen/Qwen3-14B
license: apache-2.0
datasets:
  - math
metrics:
  - accuracy
pipeline_tag: text-generation
language:
  - en
---

# Qwen/Qwen3-14B-GRPO-MATH-1EPOCH

**Description:**  

A GRPO-fine-tuned version of Qwen3-14B trained on the MATH dataset.

---

## Citation

```bibtex
@article{zhao2025learning,
  title   = {Learning to Reason without External Rewards},
  author  = {Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
  journal = {arXiv preprint arXiv:2505.19590},
  year    = {2025}
}
```