Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models Paper • 2510.09259 • Published 22 days ago • 2
RLPR Collection Extrapolating RLVR to General Domains without Verifiers • 6 items • Updated Aug 7 • 4
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118