RobotArena infty: Scalable Robot Benchmarking via Real-to-Sim Translation Paper • 2510.23571 • Published 1 day ago • 5
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published 5 days ago • 50
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published 7 days ago • 24
PICABench: How Far Are We from Physically Realistic Image Editing? Paper • 2510.17681 • Published 8 days ago • 60
LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal Paper • 2510.15868 • Published 11 days ago • 23
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published 11 days ago • 80
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Paper • 2510.15742 • Published 11 days ago • 49
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning Paper • 2510.15110 • Published 12 days ago • 15
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model Paper • 2510.10274 • Published 17 days ago • 13
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models Paper • 2510.06917 • Published 20 days ago • 34
WoW: Towards a World omniscient World model Through Embodied Interaction Paper • 2509.22642 • Published Sep 26 • 11
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28 • 75
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published Aug 28 • 89
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies Paper • 2508.20072 • Published Aug 27 • 30