Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward Paper • 2511.20561 • Published 5 days ago • 31
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27 • 172
PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning Paper • 2510.13809 • Published Oct 15 • 36
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Paper • 2504.01934 • Published Apr 2 • 22
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance Paper • 2412.06673 • Published Dec 9, 2024 • 11
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 40