-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 135 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
Collections
Discover the best community collections!
Collections including paper arxiv:2507.15061
-
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Paper • 2507.15061 • Published • 59 -
WebWalker: Benchmarking LLMs in Web Traversal
Paper • 2501.07572 • Published • 22 -
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper • 2507.02592 • Published • 121 -
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 33
-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 131 -
Magistral
Paper • 2506.10910 • Published • 64 -
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Paper • 2506.07240 • Published • 7 -
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Paper • 2506.09991 • Published • 55
-
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Paper • 2507.15061 • Published • 59 -
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 33 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 78 -
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Paper • 2509.13305 • Published • 88
-
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Paper • 2509.06501 • Published • 78 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 340 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 218 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 189
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 249 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 15 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 40 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 135 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
-
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Paper • 2507.15061 • Published • 59 -
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 33 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 78 -
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Paper • 2509.13305 • Published • 88
-
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Paper • 2509.06501 • Published • 78 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 340 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 218 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 189
-
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Paper • 2507.15061 • Published • 59 -
WebWalker: Benchmarking LLMs in Web Traversal
Paper • 2501.07572 • Published • 22 -
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper • 2507.02592 • Published • 121 -
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 33
-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 131 -
Magistral
Paper • 2506.10910 • Published • 64 -
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Paper • 2506.07240 • Published • 7 -
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Paper • 2506.09991 • Published • 55
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 249 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 15 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 40 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7