Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents Paper • 2510.24702 • Published Oct 28, 2025 • 28
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge Paper • 2506.21506 • Published Jun 26, 2025 • 51
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning Paper • 2505.23883 • Published May 29, 2025 • 2
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills Paper • 2504.07079 • Published Apr 9, 2025 • 12
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31, 2025 • 301
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models Paper • 2502.14802 • Published Feb 20, 2025 • 13
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20, 2025 • 45
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models Paper • 2502.06755 • Published Feb 10, 2025 • 7
UGround Collection Navigating GUIs as Humans Do: Universal Visual Grounding for GUI Agents (ICLR'25 Oral) • 10 items • Updated May 4, 2025 • 7
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery Paper • 2410.05080 • Published Oct 7, 2024 • 21
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents Paper • 2410.05243 • Published Oct 7, 2024 • 20
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published May 23, 2024 • 41
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Paper • 2403.19651 • Published Mar 28, 2024 • 25
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error Paper • 2403.04746 • Published Mar 7, 2024 • 24
TravelPlanner: A Benchmark for Real-World Planning with Language Agents Paper • 2402.01622 • Published Feb 2, 2024 • 37
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Paper • 2311.16502 • Published Nov 27, 2023 • 37