EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs Paper • 2509.09174 • Published Sep 11 • 57
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper • 2506.01111 • Published Jun 1 • 30
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information Paper • 2503.05085 • Published Mar 7 • 47