Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Paper • 2510.01171 • Published 27 days ago • 17
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30 • 50
WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue Paper • 2506.01881 • Published Jun 2 • 6
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement Paper • 2409.11378 • Published Sep 17, 2024 • 1
Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models? Paper • 2406.12822 • Published Jun 18, 2024