EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
Paper • 2606.03108 • Published • 11
We advance the development of AGI and foster open source collaboration towards a smarter future.
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
ESPO: Early-Stopping Proximal Policy Optimization