The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published 18 days ago • 40
Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset Paper • 1811.00207 • Published Nov 1, 2018 • 1
Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills Paper • 2004.08449 • Published Apr 17, 2020 • 1
ROBBIE: Robust Bias Evaluation of Large Generative Language Models Paper • 2311.18140 • Published Nov 29, 2023 • 1
Improving Open Language Models by Learning from Organic Interactions Paper • 2306.04707 • Published Jun 7, 2023 • 3
"I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset Paper • 2205.09209 • Published May 18, 2022
BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage Paper • 2208.03188 • Published Aug 5, 2022 • 1
Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations Paper • 2411.10414 • Published Nov 15, 2024 • 1
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published 26 days ago • 56
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published 26 days ago • 56
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 245