Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published 26 days ago • 45
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration Paper • 2508.13755 • Published Aug 19 • 14
Reinforcing Diffusion Models by Direct Group Preference Optimization Paper • 2510.08425 • Published 20 days ago • 10