InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition Paper • 2309.15112 • Published Sep 26, 2023 • 2
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization Paper • 2311.16839 • Published Nov 28, 2023 • 1
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Paper • 2401.16420 • Published Jan 29, 2024 • 55
MLLM-DataEngine: An Iterative Refinement Approach for MLLM Paper • 2308.13566 • Published Aug 25, 2023 • 1
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9, 2024 • 30
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3, 2024 • 95
MinerU: An Open-Source Solution for Precise Document Content Extraction Paper • 2409.18839 • Published Sep 27, 2024 • 35
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 136
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 136
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 144
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published Dec 10, 2024 • 28
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published Dec 10, 2024 • 28