ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Paper • 2511.01163 • Published 1 day ago • 16
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Paper • 2506.10128 • Published Jun 11 • 22
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Paper • 2506.05523 • Published Jun 5 • 34
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset Paper • 2410.22325 • Published Oct 29, 2024 • 10
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion Paper • 2407.10973 • Published Jul 15, 2024 • 11