Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence Paper • 2510.20470 • Published 5 days ago • 11
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published 5 days ago • 50
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding Paper • 2504.13180 • Published Apr 17 • 19
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts Paper • 2508.09848 • Published Aug 13 • 66
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension Paper • 2508.01959 • Published Aug 3 • 56
view article Article 🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation By moonshotai and 1 other • Jun 21 • 69
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper • 2505.23359 • Published May 29 • 39
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models Paper • 2311.17404 • Published Nov 29, 2023 • 1
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper • 2503.10615 • Published Mar 13 • 17
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published Mar 10 • 61
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 193