sunblaze-ucb
/

Qwen3-14B-GRPO-MATH-1EPOCH

Text Generation

reinforcement-learning

text-generation-inference

Model card Files Files and versions

Qwen3-14B-GRPO-MATH-1EPOCH / README.md

Xuandong's picture

Create README.md

9493b90 verified 6 months ago

|

552 Bytes

	---
	base_model: Qwen/Qwen3-14B
	license: apache-2.0
	datasets:
	- math
	metrics:
	- accuracy
	pipeline_tag: text-generation
	language:
	- en
	---

	# Qwen/Qwen3-14B-GRPO-MATH-1EPOCH

	Description:

	A GRPO-fine-tuned version of Qwen3-14B trained on the MATH dataset.

	---

	## Citation

	```bibtex
	@article{zhao2025learning,
	title = {Learning to Reason without External Rewards},
	author = {Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
	journal = {arXiv preprint arXiv:2505.19590},
	year = {2025}
	}
	```