Qwen3-14B-GRPO-MATH-1EPOCH / README.md

Xuandong

Create README.md

9493b90 verified 6 months ago

preview code

raw

history blame

552 Bytes

metadata

base_model: Qwen/Qwen3-14B
license: apache-2.0
datasets:
  - math
metrics:
  - accuracy
pipeline_tag: text-generation
language:
  - en

Qwen/Qwen3-14B-GRPO-MATH-1EPOCH

Description:

A GRPO-fine-tuned version of Qwen3-14B trained on the MATH dataset.

Citation

@article{zhao2025learning,
  title   = {Learning to Reason without External Rewards},
  author  = {Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
  journal = {arXiv preprint arXiv:2505.19590},
  year    = {2025}
}