SuperHF: Supervised Iterative Learning from Human Feedback Paper • 2310.16763 • Published Oct 25, 2023 • 1
Escalation Risks from Language Models in Military and Diplomatic Decision-Making Paper • 2401.03408 • Published Jan 7, 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Paper • 2403.03218 • Published Mar 5, 2024 • 1
Welfare Diplomacy: Benchmarking Language Model Cooperation Paper • 2310.08901 • Published Oct 13, 2023
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? Paper • 2406.04391 • Published Jun 6, 2024 • 9