Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs Paper • 2310.00582 • Published Oct 1, 2023 • 1
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining Paper • 2401.15896 • Published Jan 29, 2024
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding Paper • 2410.05249 • Published Oct 7, 2024
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Paper • 2503.08625 • Published Mar 11 • 27
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance Paper • 2502.18778 • Published Feb 26 • 1
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction Paper • 2505.02471 • Published May 5 • 15
Ming-Omni: A Unified Multimodal Model for Perception and Generation Paper • 2506.09344 • Published Jun 11 • 28
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning Paper • 2507.08306 • Published Jul 11
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer Paper • 2510.06590 • Published 19 days ago • 69
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance Paper • 2502.18778 • Published Feb 26 • 1
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer Paper • 2510.06590 • Published 19 days ago • 69