Model Card for Qwen2.5-7B-ECHO-MATH-GRPO

Based on Qwen2.5-7B, we trained the model with the ECHO framework using GRPO on the Eurus-2-RL-Math dataset.

It outperformed the Qwen2.5-32B on all six test datasets, achieving a 12% improvement on average.

Tabel 2: Model performance on math reasoning tasks. For AIME and AMC, the results are avg. @32

model	AIME24	AIME25	AMC	MATH-500	OlympiadBench	Minerva	Avg.
Qwen2.5-7B	2.7	1.9	22.0	44.6	19.7	20.9	18.6
Qwen2.5-32B	5.3	2.1	27.9	62.4	25.4	33.5	26.1
Qwen2.5-32B-ECHO(GRPO)	13.1	6.9	45.6	75.4	37.0	50.7	38.1

Quick start

from transformers import pipeline

question = "math"
generator = pipeline("text-generation", model="GradientResearch/Qwen2.5-7B-ECHO-MATH-GRPO", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Citation

If you find our work helpful, feel free to give us a cite.

@misc{xiao2025echodecouplinginferencetraining,
      title={Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms}, 
      author={Jie Xiao and Changyuan Fan and Qingnan Ren and Alfred Long and Yuchen Zhang and Rymon Yu and Eric Yang and Lynn Ai and Shaoduo Gan},
      year={2025},
      eprint={2508.05387},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.05387}, 
}

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

BF16