GigaBrain-0: A World Model-Powered Vision-Language-Action Model Paper • 2510.19430 • Published 6 days ago • 42
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment Paper • 2504.15585 • Published Apr 22 • 12
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation Paper • 2503.24026 • Published Mar 31
RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer Paper • 2505.23171 • Published May 29 • 3
WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration Paper • 2506.20590 • Published Jun 25
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models Paper • 2510.01623 • Published 26 days ago • 7
Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization Paper • 2509.10140 • Published Sep 12 • 1
DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion Paper • 2510.15264 • Published 11 days ago • 1
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation Paper • 2412.11183 • Published Dec 15, 2024
GigaBrain-0: A World Model-Powered Vision-Language-Action Model Paper • 2510.19430 • Published 6 days ago • 42
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration Paper • 2504.03536 • Published Apr 4 • 13
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model Paper • 2411.19108 • Published Nov 28, 2024 • 20
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation Paper • 2410.13571 • Published Oct 17, 2024 • 1
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond Paper • 2405.03520 • Published May 6, 2024 • 1
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published Nov 13, 2024 • 26