MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published 14 days ago • 154
MiroThinker-v1.0 Collection Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling • 7 items • Updated 10 days ago • 39
InternVL3.5-Core Collection This collection includes only the InternVL3.5 checkpoints that have completed the full training pipeline (i.e., Pretraining, SFT, MPO, Cascade RL). • 30 items • Updated Sep 28 • 12
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25 • 31
OmniCorpus Collection A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text • 6 items • Updated Sep 28 • 3
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published May 29 • 45
A Simple Aerial Detection Baseline of Multimodal Language Models Paper • 2501.09720 • Published Jan 16 • 2
Scalable Vision Language Model Training via High Quality Data Curation Paper • 2501.05952 • Published Jan 10 • 5
OmniCorpus 🐳 Collection [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text https://github.com/OpenGVLab/OmniCorpus • 5 items • Updated May 14 • 1
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published Apr 3 • 68
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published Jun 12, 2024 • 31
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 14 items • Updated Oct 22 • 62
InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language Paper • 2305.05662 • Published May 9, 2023 • 4