-
Tool-integrated Reinforcement Learning for Repo Deep Search
Paper • 2508.03012 • Published • 20 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 70 -
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents
Paper • 2509.09265 • Published • 45 -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 183
Collections
Discover the best community collections!
Collections including paper arxiv:2508.03012
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 249 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 70 -
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Paper • 2508.03501 • Published • 56 -
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 52 -
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
Paper • 2508.01415 • Published • 7
-
openai/gpt-oss-120b
Text Generation • 120B • Updated • 3.69M • • 4.05k -
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Paper • 2508.03694 • Published • 50 -
Tool-integrated Reinforcement Learning for Repo Deep Search
Paper • 2508.03012 • Published • 20 -
QuantTrio/DeepSeek-V3.2-Exp-AWQ
Text Generation • Updated • 945 • 4
-
Rank1: Test-Time Compute for Reranking in Information Retrieval
Paper • 2502.18418 • Published • 28 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval
Paper • 2505.16967 • Published • 24 -
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension
Paper • 2508.01959 • Published • 56
-
Tool-integrated Reinforcement Learning for Repo Deep Search
Paper • 2508.03012 • Published • 20 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 70 -
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents
Paper • 2509.09265 • Published • 45 -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 183
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 70 -
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Paper • 2508.03501 • Published • 56 -
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 52 -
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
Paper • 2508.01415 • Published • 7
-
openai/gpt-oss-120b
Text Generation • 120B • Updated • 3.69M • • 4.05k -
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
Paper • 2508.03694 • Published • 50 -
Tool-integrated Reinforcement Learning for Repo Deep Search
Paper • 2508.03012 • Published • 20 -
QuantTrio/DeepSeek-V3.2-Exp-AWQ
Text Generation • Updated • 945 • 4
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 249 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Rank1: Test-Time Compute for Reranking in Information Retrieval
Paper • 2502.18418 • Published • 28 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval
Paper • 2505.16967 • Published • 24 -
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension
Paper • 2508.01959 • Published • 56