Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach Paper • 2512.02834 • Published 2 days ago • 28
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors Paper • 2505.24625 • Published May 30 • 9
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions Paper • 2509.06951 • Published Sep 8 • 31
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28 • 77
EO-Robotics Collection EmbodiedOneVision is a unified framework for multimodal embodied reasoning and robot control, featuring interleaved vision-text-action pretraining. • 7 items • Updated 2 days ago • 8
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper • 2508.05635 • Published Aug 7 • 73
Hume: Introducing System-2 Thinking in Visual-Language-Action Model Paper • 2505.21432 • Published May 27 • 4
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation Paper • 2506.18095 • Published Jun 22 • 66
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 273
DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation Paper • 2505.21864 • Published May 28 • 9
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Paper • 2505.22618 • Published May 28 • 44
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 +4 Aug 21, 2024 • 42
Heimdall: test-time scaling on the generative verification Paper • 2504.10337 • Published Apr 14 • 33
UniF^2ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models Paper • 2503.08120 • Published Mar 11 • 31