Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published 6 days ago • 41
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation Paper • 2510.18692 • Published 9 days ago • 38
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling Paper • 2510.09212 • Published 20 days ago • 14
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published 20 days ago • 120
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 16 days ago • 160
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer Paper • 2508.10893 • Published Aug 14 • 31
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels Paper • 2507.21809 • Published Jul 29 • 131
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again Paper • 2507.22058 • Published Jul 29 • 38
Epona: Autoregressive Diffusion World Model for Autonomous Driving Paper • 2506.24113 • Published Jun 30 • 1
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12, 2024 • 39
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing Paper • 2312.07409 • Published Dec 12, 2023 • 23
CogAgent: A Visual Language Model for GUI Agents Paper • 2312.08914 • Published Dec 14, 2023 • 31