DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning Paper • 2602.16742 • Published 29 days ago • 12
From Perception to Action: An Interactive Benchmark for Vision Reasoning Paper • 2602.21015 • Published 23 days ago • 23
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs Paper • 2603.09906 • Published 9 days ago • 70
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies Paper • 2603.04639 • Published 14 days ago • 27
VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection Paper • 2603.00912 • Published 18 days ago • 39
RealWonder: Real-Time Physical Action-Conditioned Video Generation Paper • 2603.05449 • Published 13 days ago • 12
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence Paper • 2603.07660 • Published 11 days ago • 82
CAST: Modeling Visual State Transitions for Consistent Video Retrieval Paper • 2603.08648 • Published 10 days ago • 4
MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents Paper • 2603.09827 • Published 9 days ago • 28