Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.19205

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 305
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 258
DINOv3

Paper • 2508.10104 • Published Aug 13 • 274

Bugai's Collection

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28 • 89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24 • 80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26 • 41
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update

Paper • 2106.13914 • Published Jun 26, 2021 • 1
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges

Paper • 2506.15196 • Published Jun 18 • 3
Ascend HiFloat8 Format for Deep Learning

Paper • 2409.16626 • Published Sep 25, 2024 • 1
Recipes for Pre-training LLMs with MXFP8

Paper • 2506.08027 • Published May 30 • 1

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Paper • 2412.15322 • Published Dec 19, 2024 • 20
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 85
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Paper • 2505.02625 • Published May 5 • 22
Fast Text-to-Audio Generation with Adversarial Post-Training

Paper • 2505.08175 • Published May 13 • 24

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123
Running on Zero

445

445

IndexTTS 2 Demo

🏢

Generate expressive speech from text with emotion control

Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/

microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Sep 1 • 236k • 1.93k
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 21

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 305
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 258
DINOv3

Paper • 2508.10104 • Published Aug 13 • 274

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

Bugai's Collection

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28 • 89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24 • 80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26 • 41
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123
Running on Zero

445

445

IndexTTS 2 Demo

🏢

Generate expressive speech from text with emotion control

LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update

Paper • 2106.13914 • Published Jun 26, 2021 • 1
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges

Paper • 2506.15196 • Published Jun 18 • 3
Ascend HiFloat8 Format for Deep Learning

Paper • 2409.16626 • Published Sep 25, 2024 • 1
Recipes for Pre-training LLMs with MXFP8

Paper • 2506.08027 • Published May 30 • 1

Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/

microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Sep 1 • 236k • 1.93k
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Paper • 2412.15322 • Published Dec 19, 2024 • 20
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 85
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Paper • 2505.02625 • Published May 5 • 22
Fast Text-to-Audio Generation with Adversarial Post-Training

Paper • 2505.08175 • Published May 13 • 24

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 21

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs