UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View Paper • 2303.15083 • Published Mar 27, 2023
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 3 days ago • 45
DSR_Suite Collection Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models • 3 items • Updated 3 days ago • 6
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives Paper • 2512.14699 • Published 10 days ago • 27
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 3 days ago • 45 • 2
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 3 days ago • 45
DSR_Suite Collection Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models • 3 items • Updated 3 days ago • 6
DSR_Suite Collection Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models • 3 items • Updated 3 days ago • 6
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs Paper • 2512.14698 • Published 10 days ago • 18
ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries Paper • 2511.14349 • Published Nov 18 • 17
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published Oct 22 • 29
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89