NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8 • 28
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision Paper • 2507.20976 • Published Jul 28 • 10
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors Paper • 2505.23001 • Published May 29 • 8