Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models Paper • 2603.25750 • Published 16 days ago • 35
Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models Paper • 2603.25750 • Published 16 days ago • 35
4DGS360: 360° Gaussian Reconstruction of Dynamic Objects from a Single Video Paper • 2603.21618 • Published 13 days ago • 15
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 12 days ago • 120
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models Paper • 2603.22212 • Published 12 days ago • 124
PEARL: Personalized Streaming Video Understanding Model Paper • 2603.20422 • Published 15 days ago • 40
SNAP: Speaker Nulling for Artifact Projection in Speech Deepfake Detection Paper • 2603.20686 • Published 15 days ago • 3
SNAP: Speaker Nulling for Artifact Projection in Speech Deepfake Detection Paper • 2603.20686 • Published 15 days ago • 3
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model Paper • 2603.18524 • Published 17 days ago • 58
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 19 days ago • 152
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published Feb 24 • 101