-
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors
Paper • 2312.16837 • Published • 6 -
Learning the 3D Fauna of the Web
Paper • 2401.02400 • Published • 11 -
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Paper • 2310.15110 • Published • 3 -
Zero-1-to-3: Zero-shot One Image to 3D Object
Paper • 2303.11328 • Published • 5
Sascha Kirch
sascha-kirch
AI & ML interests
multi-modal generative deeplearning, diffusion models, GANs, foundation models
Organizations
None yet
DL Perception
State-Space models
-
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21 -
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Paper • 2406.07522 • Published • 40 -
VSSD: Vision Mamba with Non-Casual State Space Duality
Paper • 2407.18559 • Published • 20
Diffusion Models
-
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper • 2401.01952 • Published • 32 -
ODIN: A Single Model for 2D and 3D Perception
Paper • 2401.02416 • Published • 13 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 22 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 13
Foundation Models
-
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 29 -
Probing the 3D Awareness of Visual Foundation Models
Paper • 2404.08636 • Published • 14 -
AM-RADIO: Agglomerative Model -- Reduce All Domains Into One
Paper • 2312.06709 • Published • 2
3D Reconstruction
-
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors
Paper • 2312.16837 • Published • 6 -
Learning the 3D Fauna of the Web
Paper • 2401.02400 • Published • 11 -
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Paper • 2310.15110 • Published • 3 -
Zero-1-to-3: Zero-shot One Image to 3D Object
Paper • 2303.11328 • Published • 5
Diffusion Models
-
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper • 2401.01952 • Published • 32 -
ODIN: A Single Model for 2D and 3D Perception
Paper • 2401.02416 • Published • 13 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 22 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 13
DL Perception
Foundation Models
-
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 29 -
Probing the 3D Awareness of Visual Foundation Models
Paper • 2404.08636 • Published • 14 -
AM-RADIO: Agglomerative Model -- Reduce All Domains Into One
Paper • 2312.06709 • Published • 2
State-Space models
-
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21 -
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Paper • 2406.07522 • Published • 40 -
VSSD: Vision Mamba with Non-Casual State Space Duality
Paper • 2407.18559 • Published • 20