view article Article From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels Aug 18 • 83
Efficient Part-level 3D Object Generation via Dual Volume Packing Paper • 2506.09980 • Published Jun 11 • 7
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions Paper • 2506.09984 • Published Jun 11 • 14
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning Paper • 2506.08889 • Published Jun 10 • 23
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation Paper • 2506.08570 • Published Jun 10 • 33
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper • 2506.09790 • Published Jun 11 • 53
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation Paper • 2506.09350 • Published Jun 11 • 48
Seedance 1.0: Exploring the Boundaries of Video Generation Models Paper • 2506.09113 • Published Jun 10 • 102
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training Paper • 2506.05301 • Published Jun 5 • 56
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development Paper • 2506.05010 • Published Jun 5 • 79
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Model Paper • 2505.21179 • Published May 27 • 13
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors Paper • 2505.24625 • Published May 30 • 9
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning Paper • 2506.00338 • Published May 31 • 10
Cora: Correspondence-aware image editing using few step diffusion Paper • 2505.23907 • Published May 29 • 11
ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding Paper • 2506.01853 • Published Jun 2 • 32
Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models Paper • 2506.00996 • Published Jun 1 • 38
ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL Paper • 2505.24875 • Published May 30 • 10
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering Paper • 2505.24417 • Published May 30 • 13