Xuandong's picture
Create README.md
9493b90 verified
|
raw
history blame
552 Bytes
metadata
base_model: Qwen/Qwen3-14B
license: apache-2.0
datasets:
  - math
metrics:
  - accuracy
pipeline_tag: text-generation
language:
  - en

Qwen/Qwen3-14B-GRPO-MATH-1EPOCH

Description:

A GRPO-fine-tuned version of Qwen3-14B trained on the MATH dataset.


Citation

@article{zhao2025learning,
  title   = {Learning to Reason without External Rewards},
  author  = {Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
  journal = {arXiv preprint arXiv:2505.19590},
  year    = {2025}
}