Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 13 days ago • 157
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training Paper • 2510.11712 • Published 13 days ago • 30
FlashWorld: High-quality 3D Scene Generation within Seconds Paper • 2510.13678 • Published 11 days ago • 69
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published 17 days ago • 117
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Paper • 2510.02283 • Published 24 days ago • 91
VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models Paper • 2509.17985 • Published Sep 22 • 25
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Paper • 2509.18824 • Published Sep 23 • 22
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space Paper • 2508.19247 • Published Aug 26 • 41
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis Paper • 2509.10441 • Published Sep 12 • 30
LuxDiT: Lighting Estimation with Video Diffusion Transformer Paper • 2509.03680 • Published Sep 3 • 16
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models Paper • 2508.09968 • Published Aug 13 • 15
Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation Paper • 2508.07981 • Published Aug 11 • 58