Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.03012

Tool-integrated Reinforcement Learning for Repo Deep Search

Paper • 2508.03012 • Published Aug 5 • 20
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 70
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Paper • 2509.09265 • Published Sep 11 • 45
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 183

Tool-integrated Reinforcement Learning for Repo Deep Search

Paper • 2508.03012 • Published Aug 5 • 20

about 11 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 249 • 96
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 70
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5 • 56
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published Aug 6 • 52
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

Paper • 2508.01415 • Published Aug 2 • 7

openai/gpt-oss-120b

Text Generation • 120B • Updated Aug 26 • 3.69M • • 4.05k
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

Paper • 2508.03694 • Published Aug 5 • 50
Tool-integrated Reinforcement Learning for Repo Deep Search

Paper • 2508.03012 • Published Aug 5 • 20
QuantTrio/DeepSeek-V3.2-Exp-AWQ

Text Generation • Updated 26 days ago • 945 • 4

Information_retrieval

Rank1: Test-Time Compute for Reranking in Information Retrieval

Paper • 2502.18418 • Published Feb 25 • 28
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval

Paper • 2505.16967 • Published May 22 • 24
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension

Paper • 2508.01959 • Published Aug 3 • 56

Tool-integrated Reinforcement Learning for Repo Deep Search

Paper • 2508.03012 • Published Aug 5 • 20
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 70
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Paper • 2509.09265 • Published Sep 11 • 45
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 183

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 70
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5 • 56
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published Aug 6 • 52
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

Paper • 2508.01415 • Published Aug 2 • 7

Tool-integrated Reinforcement Learning for Repo Deep Search

Paper • 2508.03012 • Published Aug 5 • 20

openai/gpt-oss-120b

Text Generation • 120B • Updated Aug 26 • 3.69M • • 4.05k
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

Paper • 2508.03694 • Published Aug 5 • 50
Tool-integrated Reinforcement Learning for Repo Deep Search

Paper • 2508.03012 • Published Aug 5 • 20
QuantTrio/DeepSeek-V3.2-Exp-AWQ

Text Generation • Updated 26 days ago • 945 • 4

about 11 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 249 • 96
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Information_retrieval

Rank1: Test-Time Compute for Reranking in Information Retrieval

Paper • 2502.18418 • Published Feb 25 • 28
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval

Paper • 2505.16967 • Published May 22 • 24
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension

Paper • 2508.01959 • Published Aug 3 • 56

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs