Emo-Codec

community

AI & ML interests

Codec Language Models, Speech, Emotion Recognition

Recent Activity

dlion168 authored a paper about 1 month ago

Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models

dlion168 authored a paper about 1 month ago

EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition

dlion168 authored a paper about 1 month ago

Multi-Distillation from Speech and Music Representation Models

View all activity

dlion168

authored 16 papers about 1 month ago

Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models

Paper • 2408.07665 • Published Aug 14, 2024

EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition

Paper • 2506.04652 • Published Jun 5 • 1

Multi-Distillation from Speech and Music Representation Models

Paper • 2506.07237 • Published Jun 8

MMMOS: Multi-domain Multi-axis Audio Quality Assessment

Paper • 2507.04094 • Published Jul 5

ASTAR-NTU solution to AudioMOS Challenge 2025 Track1

Paper • 2507.09904 • Published Jul 14

Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention's Alternative

Paper • 2508.09294 • Published Aug 12

Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning

Paper • 2505.16220 • Published May 22 • 1

Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems

Paper • 2509.13989 • Published Sep 17 • 3

CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition

Paper • 2506.06071 • Published Jun 6 • 1

MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model

Paper • 2509.20706 • Published Sep 25 • 2

Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

Paper • 2402.13071 • Published Feb 20, 2024 • 1

Towards audio language modeling -- an overview

Paper • 2402.13236 • Published Feb 20, 2024

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Paper • 2411.05361 • Published Nov 8, 2024 • 3

BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights

Paper • 2501.17790 • Published Jan 29 • 3

ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality

Paper • 2505.15773 • Published May 21 • 1

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Paper • 2507.02768 • Published Jul 3 • 18

Rezzsl

updated a dataset over 1 year ago

Emo-Codec/CREMA-D_synth

Viewer • Updated Feb 24, 2024 • 112k • 13 • 1