Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published about 1 month ago • 56
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper • 2510.07242 • Published 18 days ago • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published 17 days ago • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published 23 days ago • 44
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper • 2510.11052 • Published 14 days ago • 50
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment Paper • 2510.10201 • Published 15 days ago • 35
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published 13 days ago • 31
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 13 days ago • 165