-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2506.14245
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 55 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 25 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 78
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 131 -
Magistral
Paper • 2506.10910 • Published • 64 -
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Paper • 2506.07240 • Published • 7 -
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Paper • 2506.09991 • Published • 55
-
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
SafeArena: Evaluating the Safety of Autonomous Web Agents
Paper • 2503.04957 • Published • 21 -
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper • 2503.04808 • Published • 18 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 131 -
Magistral
Paper • 2506.10910 • Published • 64 -
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Paper • 2506.07240 • Published • 7 -
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Paper • 2506.09991 • Published • 55
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
SafeArena: Evaluating the Safety of Autonomous Web Agents
Paper • 2503.04957 • Published • 21 -
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper • 2503.04808 • Published • 18 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 55 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 25 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 78