Gabriel Mukobi's picture

4 1

Gabriel Mukobi

gmukobi

https://gabrielmukobi.com/

AI & ML interests

AI safety, robustness, interpretability, evaluations, value learning.

Organizations

None yet

authored 5 papers over 1 year ago

SuperHF: Supervised Iterative Learning from Human Feedback

Paper • 2310.16763 • Published Oct 25, 2023 • 1

Escalation Risks from Language Models in Military and Diplomatic Decision-Making

Paper • 2401.03408 • Published Jan 7, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Paper • 2403.03218 • Published Mar 5, 2024 • 1

Welfare Diplomacy: Benchmarking Language Model Cooperation

Paper • 2310.08901 • Published Oct 13, 2023

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Paper • 2406.04391 • Published Jun 6, 2024 • 9