One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration Paper • 2510.12088 • Published 13 days ago • 4
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 183
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 218
Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning Paper • 2509.25052 • Published 27 days ago • 4
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios Paper • 2509.21766 • Published Sep 26 • 23
Quantile Advantage Estimation for Entropy-Safe Reasoning Paper • 2509.22611 • Published about 1 month ago • 117
Understanding Tool-Integrated Reasoning Collection The official models and datasets for the paper "Understanding Tool-Integrated Reasoning" • 5 items • Updated Aug 27 • 2
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Paper • 2507.19457 • Published Jul 25 • 28
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published Apr 14 • 84
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding Paper • 2504.01943 • Published Apr 2 • 15