Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers Paper • 2406.16450 • Published Jun 24, 2024
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17 • 14
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions Paper • 2507.08068 • Published Jul 10
QRPO Reference Datasets Collection Datasets with reference completions and rewards used in the paper https://arxiv.org/abs/2507.08068. • 29 items • Updated about 10 hours ago
QRPO Reference Datasets Collection Datasets with reference completions and rewards used in the paper https://arxiv.org/abs/2507.08068. • 29 items • Updated about 10 hours ago