OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17, 2025 • 89
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations Paper • 2508.09789 • Published Aug 13, 2025 • 5 • 7
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study Paper • 2508.13142 • Published Aug 18, 2025 • 34
StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation Paper • 2508.11203 • Published Aug 15, 2025 • 10
FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation Paper • 2508.11255 • Published Aug 15, 2025 • 11
TexVerse: A Universe of 3D Objects with High-Resolution Textures Paper • 2508.10868 • Published Aug 14, 2025 • 17
SPARSE Data, Rich Results: Few-Shot Semi-Supervised Learning via Class-Conditioned Image Translation Paper • 2508.06429 • Published Aug 8, 2025 • 2
MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data Paper • 2508.10894 • Published Aug 14, 2025 • 6