Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published about 22 hours ago • 51
FlashWorld: High-quality 3D Scene Generation within Seconds Paper • 2510.13678 • Published 16 days ago • 69
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution Paper • 2510.12747 • Published 17 days ago • 36
InfiniHuman: Infinite 3D Human Creation with Precise Control Paper • 2510.11650 • Published 18 days ago • 5
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published 22 days ago • 120
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Paper • 2510.06499 • Published 24 days ago • 31
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning Paper • 2510.08555 • Published 22 days ago • 62
Triangle Splatting+: Differentiable Rendering with Opaque Triangles Paper • 2509.25122 • Published Sep 29 • 8
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time Paper • 2509.25161 • Published Sep 29 • 23
Running on Zero 49 49 Mapanything Gradio 🐠 Convert images to 3D models and visualize depth and normals
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Paper • 2509.09595 • Published Sep 11 • 48
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing Paper • 2508.10881 • Published Aug 14 • 52
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models Paper • 2508.02095 • Published Aug 4 • 9
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh Paper • 2508.01242 • Published Aug 2 • 10
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts Paper • 2507.20939 • Published Jul 28 • 56
EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion Paper • 2507.16535 • Published Jul 22 • 20