DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation Paper • 2510.09116 • Published 17 days ago • 94
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published 18 days ago • 40
The Era of Real-World Human Interaction: RL from User Conversations Paper • 2509.25137 • Published 28 days ago • 18
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper • 2510.07242 • Published 19 days ago • 30
LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals Paper • 2509.21875 • Published Sep 26 • 9
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2 • 25
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics Paper • 2508.18124 • Published Aug 25 • 48
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11 • 47
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks Paper • 2507.23751 • Published Jul 31 • 4
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems Paper • 2508.07407 • Published Aug 10 • 97
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14 • 88
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 185