Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Gabriel Mukobi's picture
4 1

Gabriel Mukobi

gmukobi
https://gabrielmukobi.com/
  • gabemukobi
  • mukobi

AI & ML interests

AI safety, robustness, interpretability, evaluations, value learning.

Organizations

None yet

authored 5 papers over 1 year ago

SuperHF: Supervised Iterative Learning from Human Feedback

Paper • 2310.16763 • Published Oct 25, 2023 • 1

Escalation Risks from Language Models in Military and Diplomatic Decision-Making

Paper • 2401.03408 • Published Jan 7, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Paper • 2403.03218 • Published Mar 5, 2024 • 1

Welfare Diplomacy: Benchmarking Language Model Cooperation

Paper • 2310.08901 • Published Oct 13, 2023

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Paper • 2406.04391 • Published Jun 6, 2024 • 9
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs