-
Scaling Agents via Continual Pre-training
Paper • 2509.13310 • Published • 112 -
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Paper • 2509.13305 • Published • 89 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 94 -
Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents
Paper • 2510.14438 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2509.13305
-
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Paper • 2507.15061 • Published • 59 -
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 33 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 77 -
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Paper • 2509.13305 • Published • 89
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 266 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 54 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 40 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 29
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 29 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 33 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 672 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 341 -
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 233 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 219
-
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 109 -
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Paper • 2508.16949 • Published • 22 -
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control
Paper • 2508.21112 • Published • 75 -
UItron: Foundational GUI Agent with Advanced Perception and Planning
Paper • 2508.21767 • Published • 12
-
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Paper • 2503.22675 • Published • 36 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 45 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 77 -
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents
Paper • 2509.13309 • Published • 67
-
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 40 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 65 -
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Paper • 2406.14544 • Published • 35 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 95
-
Scaling Agents via Continual Pre-training
Paper • 2509.13310 • Published • 112 -
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Paper • 2509.13305 • Published • 89 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 94 -
Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents
Paper • 2510.14438 • Published • 13
-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 672 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 341 -
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 233 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 219
-
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Paper • 2507.15061 • Published • 59 -
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 33 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 77 -
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Paper • 2509.13305 • Published • 89
-
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 109 -
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Paper • 2508.16949 • Published • 22 -
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control
Paper • 2508.21112 • Published • 75 -
UItron: Foundational GUI Agent with Advanced Perception and Planning
Paper • 2508.21767 • Published • 12
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 266 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Paper • 2503.22675 • Published • 36 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 45 -
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Paper • 2509.13313 • Published • 77 -
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents
Paper • 2509.13309 • Published • 67
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 54 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 40 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 29
-
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 40 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 65 -
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Paper • 2406.14544 • Published • 35 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 95
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 29 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 33 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7