- 
	
	
	
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 - 
	
	
	
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 - 
	
	
	
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 - 
	
	
	
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21 
Collections
Discover the best community collections!
Collections including paper arxiv:2406.03344 
						
					
				- 
	
	
	
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 195 - 
	
	
	
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper • 2403.10493 • Published • 19 - 
	
	
	
Music Consistency Models
Paper • 2404.13358 • Published • 14 - 
	
	
	
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper • 2406.02430 • Published • 38 
- 
	
	
	
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21 - 
	
	
	
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Paper • 2406.07522 • Published • 40 - 
	
	
	
VSSD: Vision Mamba with Non-Casual State Space Duality
Paper • 2407.18559 • Published • 20 
- 
	
	
	
Music Consistency Models
Paper • 2404.13358 • Published • 14 - 
	
	
	
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Paper • 2407.02869 • Published • 21 - 
	
	
	
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 - 
	
	
	
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21 
- 
	
	
	
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
Paper • 2402.00892 • Published • 14 - 
	
	
	276
MusicGen Streaming
🔥Generate music from text prompts
 - 
	
	
	145
Whisper JAX
👀Transcribe or translate audio from microphone, file, or YouTube
 - 
	
	
	
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21 
- 
	
	
	
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 - 
	
	
	
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 - 
	
	
	
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 - 
	
	
	
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21 
- 
	
	
	
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21 - 
	
	
	
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Paper • 2406.07522 • Published • 40 - 
	
	
	
VSSD: Vision Mamba with Non-Casual State Space Duality
Paper • 2407.18559 • Published • 20 
- 
	
	
	
Music Consistency Models
Paper • 2404.13358 • Published • 14 - 
	
	
	
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Paper • 2407.02869 • Published • 21 - 
	
	
	
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 - 
	
	
	
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21 
- 
	
	
	
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 195 - 
	
	
	
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper • 2403.10493 • Published • 19 - 
	
	
	
Music Consistency Models
Paper • 2404.13358 • Published • 14 - 
	
	
	
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper • 2406.02430 • Published • 38 
- 
	
	
	
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
Paper • 2402.00892 • Published • 14 - 
	
	
	276
MusicGen Streaming
🔥Generate music from text prompts
 - 
	
	
	145
Whisper JAX
👀Transcribe or translate audio from microphone, file, or YouTube
 - 
	
	
	
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21