BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks Paper • 2510.02418 • Published 28 days ago • 2
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks Paper • 2510.02418 • Published 28 days ago • 2
Adaptive Evaluations Collection Datasets for our paper, Adaptively profiling models with task elicitation (EMNLP 2025). • 1 item • Updated Sep 20