Model Card for Qwen2.5-7B-ECHO-MATH-GRPO

Based on Qwen2.5-7B, we trained the model with the ECHO framework using GRPO on the Eurus-2-RL-Math dataset.

It outperformed the Qwen2.5-32B on all six test datasets, achieving a 12% improvement on average.

Tabel 2: Model performance on math reasoning tasks. For AIME and AMC, the results are avg. @32

model AIME24 AIME25 AMC MATH-500 OlympiadBench Minerva Avg.
Qwen2.5-7B 2.7 1.9 22.0 44.6 19.7 20.9 18.6
Qwen2.5-32B 5.3 2.1 27.9 62.4 25.4 33.5 26.1
Qwen2.5-32B-ECHO(GRPO) 13.1 6.9 45.6 75.4 37.0 50.7 38.1

Quick start

from transformers import pipeline

question = "math"
generator = pipeline("text-generation", model="GradientResearch/Qwen2.5-7B-ECHO-MATH-GRPO", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Citation

If you find our work helpful, feel free to give us a cite.

@misc{xiao2025echodecouplinginferencetraining,
      title={Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms}, 
      author={Jie Xiao and Changyuan Fan and Qingnan Ren and Alfred Long and Yuchen Zhang and Rymon Yu and Eric Yang and Lynn Ai and Shaoduo Gan},
      year={2025},
      eprint={2508.05387},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.05387}, 
}
Downloads last month
4
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support