Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning Paper • 2309.07911 • Published Sep 14, 2023
PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework Paper • 2211.11629 • Published Nov 21, 2022
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery Paper • 2312.10115 • Published Dec 15, 2023 • 1
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction Paper • 2505.02471 • Published May 5 • 15
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping Paper • 2503.21817 • Published Mar 26 • 1
Ming-Omni: A Unified Multimodal Model for Perception and Generation Paper • 2506.09344 • Published Jun 11 • 28
GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks Paper • 2509.23738 • Published Sep 28 • 1
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer Paper • 2510.06590 • Published 23 days ago • 70