GigaWorld-0: World Models as Data Engine to Empower Embodied AI Paper • 2511.19861 • Published Nov 25 • 30 • 6
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published Nov 24 • 27 • 4
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence Paper • 2509.12203 • Published Sep 15 • 19 • 3
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation Paper • 2506.18839 • Published Jun 18 • 12 • 2
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation Paper • 2502.07531 • Published Feb 11 • 12 • 3
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture Paper • 2311.10123 • Published Nov 16, 2023 • 18 • 1
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models Paper • 2308.04729 • Published Aug 9, 2023 • 32 • 6