TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis Paper • 2508.13618 • Published Aug 19 • 17
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation Paper • 2506.18095 • Published Jun 22 • 66
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information Paper • 2503.05085 • Published Mar 7 • 47