Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published 5 days ago • 93
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published 8 days ago • 172
VISTA: A Test-Time Self-Improving Video Generation Agent Paper • 2510.15831 • Published 18 days ago • 20
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation Paper • 2510.14974 • Published 19 days ago • 7
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published 18 days ago • 48
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training Paper • 2510.12586 • Published 21 days ago • 107
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 22 days ago • 160
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling Paper • 2510.04533 • Published 29 days ago • 47
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation Paper • 2510.08551 • Published 26 days ago • 31
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published 26 days ago • 121
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers Paper • 2503.06923 • Published Mar 10 • 3
view article Article 🔥 Announcing FLUX-Juiced: The Fastest Image Generation Endpoint (2.6 times faster)! By PrunaAI and 3 others • Apr 23 • 12
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder Paper • 2509.25182 • Published Sep 29 • 36
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance Paper • 2509.26231 • Published Sep 30 • 17
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time Paper • 2509.25161 • Published Sep 29 • 23
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28 • 115
VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction Paper • 2509.19297 • Published Sep 23 • 23
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Paper • 2509.18824 • Published Sep 23 • 22
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models Paper • 2509.17627 • Published Sep 22 • 65
SPATIALGEN: Layout-guided 3D Indoor Scene Generation Paper • 2509.14981 • Published Sep 18 • 27