Papers
arxiv:2510.02780

Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models

Published on Oct 3
Authors:

Abstract

A comprehensive analysis of Vision-Language Models' reasoning processes on rebus puzzles reveals significant strengths and limitations, highlighting the importance of explainability in model evaluation.

AI-generated summary

Vision-Language Models (VLMs) excel at many multimodal tasks, yet their cognitive processes remain opaque on complex lateral thinking challenges like rebus puzzles. While recent work has demonstrated these models struggle significantly with rebus puzzle solving, the underlying reasoning processes and failure patterns remain largely unexplored. We address this gap through a comprehensive explainability analysis that moves beyond performance metrics to understand how VLMs approach these complex lateral thinking challenges. Our study contributes a systematically annotated dataset of 221 rebus puzzles across six cognitive categories, paired with an evaluation framework that separates reasoning quality from answer correctness. We investigate three prompting strategies designed to elicit different types of explanatory processes and reveal critical insights into VLM cognitive processes. Our findings demonstrate that reasoning quality varies dramatically across puzzle categories, with models showing systematic strengths in visual composition while exhibiting fundamental limitations in absence interpretation and cultural symbolism. We also discover that prompting strategy substantially influences both cognitive approach and problem-solving effectiveness, establishing explainability as an integral component of model performance rather than a post-hoc consideration.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.02780 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.02780 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.