GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving Paper • 2510.11769 • Published 15 days ago • 25
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Paper • 2505.10554 • Published May 15 • 120
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Paper • 2505.02391 • Published May 5 • 25
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17 • 93
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2 • 56
Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data Paper • 2302.12822 • Published Feb 24, 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment Paper • 2304.06767 • Published Apr 13, 2023 • 2
FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation Paper • 2408.12168 • Published Aug 22, 2024