papers-to-read
updated
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
• 2505.24726
• Published
• 277
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
• 2507.01006
• Published
• 251
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published
• 261
MemOS: A Memory OS for AI System
Paper
• 2507.03724
• Published
• 159
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper
• 2507.15846
• Published
• 133
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
• 2507.16784
• Published
• 122
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper
• 2507.02592
• Published
• 124
4KAgent: Agentic Any Image to 4K Super-Resolution
Paper
• 2507.07105
• Published
• 106
ScreenCoder: Advancing Visual-to-Code Generation for Front-End
Automation via Modular Multimodal Agents
Paper
• 2507.22827
• Published
• 100
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
• 2508.01191
• Published
• 238
Paper
• 2508.10104
• Published
• 297
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published
• 206
VeriGUI: Verifiable Long-Chain GUI Dataset
Paper
• 2508.04026
• Published
• 162
On the Generalization of SFT: A Reinforcement Learning Perspective with
Reward Rectification
Paper
• 2508.05629
• Published
• 183
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
• 2508.16153
• Published
• 160
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
• 2509.08721
• Published
• 662
A.S.E: A Repository-Level Benchmark for Evaluating Security in
AI-Generated Code
Paper
• 2508.18106
• Published
• 348
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published
• 230
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published
• 190
A Survey of Scientific Large Language Models: From Data Foundations to
Agent Frontiers
Paper
• 2508.21148
• Published
• 140
Why Language Models Hallucinate
Paper
• 2509.04664
• Published
• 196
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
• 2509.07980
• Published
• 104
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for
Open-Ended Deep Research
Paper
• 2509.13312
• Published
• 105
Scaling Agents via Continual Pre-training
Paper
• 2509.13310
• Published
• 117
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Paper
• 2509.06501
• Published
• 82
Towards a Unified View of Large Language Model Post-Training
Paper
• 2509.04419
• Published
• 76
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper
• 2509.01055
• Published
• 79
MachineLearningLM: Continued Pretraining Language Models on Millions of
Synthetic Tabular Prediction Tasks Scales In-Context ML
Paper
• 2509.06806
• Published
• 64
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper
• 2509.15207
• Published
• 116