ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents Paper • 2604.23781 • Published 4 days ago • 30
MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences Paper • 2603.27813 • Published Mar 29 • 23
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation Paper • 2502.13092 • Published Feb 18, 2025 • 13