Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published 15 days ago • 43
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published 9 days ago • 18
Accelerating Vision Transformers with Adaptive Patch Sizes Paper • 2510.18091 • Published 11 days ago • 4
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published 9 days ago • 26
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published 10 days ago • 80
The Art of Scaling Reinforcement Learning Compute for LLMs Paper • 2510.13786 • Published 16 days ago • 30
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization Paper • 2510.13554 • Published 16 days ago • 55
HoneyBee: Data Recipes for Vision-Language Reasoners Paper • 2510.12225 • Published 17 days ago • 9
AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model Paper • 2510.11496 • Published 18 days ago • 3
Reinforcing Diffusion Models by Direct Group Preference Optimization Paper • 2510.08425 • Published 22 days ago • 10
NorMuon: Making Muon more efficient and scalable Paper • 2510.05491 • Published 25 days ago • 6
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17 • 12
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation Paper • 2509.15194 • Published Sep 18 • 33
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 184
The Majority is not always right: RL training for solution aggregation Paper • 2509.06870 • Published Sep 8 • 16