Xuandong's picture
Create README.md
9493b90 verified
|
raw
history blame
552 Bytes
---
base_model: Qwen/Qwen3-14B
license: apache-2.0
datasets:
- math
metrics:
- accuracy
pipeline_tag: text-generation
language:
- en
---
# Qwen/Qwen3-14B-GRPO-MATH-1EPOCH
**Description:**
A GRPO-fine-tuned version of Qwen3-14B trained on the MATH dataset.
---
## Citation
```bibtex
@article{zhao2025learning,
title = {Learning to Reason without External Rewards},
author = {Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
journal = {arXiv preprint arXiv:2505.19590},
year = {2025}
}
```