Model Card for Qwen2.5-7B-ECHO-MATH-GRPO
Based on Qwen2.5-7B, we trained the model with the ECHO framework using GRPO on the Eurus-2-RL-Math dataset.
It outperformed the Qwen2.5-32B on all six test datasets, achieving a 12% improvement on average.
Tabel 2: Model performance on math reasoning tasks. For AIME and AMC, the results are avg. @32
| model | AIME24 | AIME25 | AMC | MATH-500 | OlympiadBench | Minerva | Avg. |
|---|---|---|---|---|---|---|---|
| Qwen2.5-7B | 2.7 | 1.9 | 22.0 | 44.6 | 19.7 | 20.9 | 18.6 |
| Qwen2.5-32B | 5.3 | 2.1 | 27.9 | 62.4 | 25.4 | 33.5 | 26.1 |
| Qwen2.5-32B-ECHO(GRPO) | 13.1 | 6.9 | 45.6 | 75.4 | 37.0 | 50.7 | 38.1 |
Quick start
from transformers import pipeline
question = "math"
generator = pipeline("text-generation", model="GradientResearch/Qwen2.5-7B-ECHO-MATH-GRPO", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
Citation
If you find our work helpful, feel free to give us a cite.
@misc{xiao2025echodecouplinginferencetraining,
title={Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms},
author={Jie Xiao and Changyuan Fan and Qingnan Ren and Alfred Long and Yuchen Zhang and Rymon Yu and Eric Yang and Lynn Ai and Shaoduo Gan},
year={2025},
eprint={2508.05387},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2508.05387},
}
- Downloads last month
- 4