-
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
Paper • 2508.21104 • Published • 32 -
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables
Paper • 2508.19813 • Published • 25 -
No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes
Paper • 2508.19060 • Published • 10 -
From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Paper • 2508.17198 • Published • 9
Collections
Discover the best community collections!
Collections including paper arxiv:2508.20931
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 11 -
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 83 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15 -
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Paper • 2509.13761 • Published • 16
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 70 -
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Paper • 2508.03501 • Published • 56 -
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 52 -
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
Paper • 2508.01415 • Published • 7
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 11 -
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Paper • 2508.20453 • Published • 63 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15 -
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Paper • 2509.08755 • Published • 56
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 249 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
Paper • 2508.21104 • Published • 32 -
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables
Paper • 2508.19813 • Published • 25 -
No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes
Paper • 2508.19060 • Published • 10 -
From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Paper • 2508.17198 • Published • 9
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 11 -
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 83 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15 -
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Paper • 2509.13761 • Published • 16
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 11 -
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Paper • 2508.20453 • Published • 63 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15 -
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Paper • 2509.08755 • Published • 56
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 70 -
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Paper • 2508.03501 • Published • 56 -
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 52 -
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
Paper • 2508.01415 • Published • 7
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 249 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88