AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons Paper • 2503.05731 • Published Feb 19 • 3
SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks Paper • 2410.03769 • Published Oct 2, 2024