Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems Paper • 2512.24385 • Published 4 days ago • 7
An Information Theoretic Perspective on Agentic System Design Paper • 2512.21720 • Published 9 days ago • 7
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone Paper • 2512.22615 • Published 7 days ago • 39
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents Paper • 2512.22322 • Published 8 days ago • 35
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published 8 days ago • 55
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 11 days ago • 48
Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published 16 days ago • 30
SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published 11 days ago • 42
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published 12 days ago • 29
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence Paper • 2512.10863 • Published 23 days ago • 21
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image Paper • 2512.16899 • Published 16 days ago • 12
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published 16 days ago • 72
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs Paper • 2512.17008 • Published 16 days ago • 10
AdaTooler-V: Adaptive Tool-Use for Images and Videos Paper • 2512.16918 • Published 16 days ago • 12
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 16 days ago • 82
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published 16 days ago • 19
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models Paper • 2512.15713 • Published 17 days ago • 16