Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published 4 days ago • 17
Accelerating Vision Transformers with Adaptive Patch Sizes Paper • 2510.18091 • Published 6 days ago • 4
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published 5 days ago • 22
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published 5 days ago • 77
The Art of Scaling Reinforcement Learning Compute for LLMs Paper • 2510.13786 • Published 11 days ago • 30
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization Paper • 2510.13554 • Published 11 days ago • 54
HoneyBee: Data Recipes for Vision-Language Reasoners Paper • 2510.12225 • Published 13 days ago • 9
AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model Paper • 2510.11496 • Published 13 days ago • 3
Reinforcing Diffusion Models by Direct Group Preference Optimization Paper • 2510.08425 • Published 17 days ago • 10