EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper • 2506.09827 • Published Jun 11 • 20
NextCoder Collection NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. • 6 items • Updated Jul 9 • 71
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 547
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Jul 21 • 125
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Paper • 2410.23320 • Published Oct 30, 2024 • 8
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer Paper • 2409.00750 • Published Sep 1, 2024 • 5
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Paper • 2409.10058 • Published Sep 16, 2024 • 2
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9, 2024 • 46