Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning
Abstract
A dual-penalty reinforcement learning framework reduces overthinking in Large Reasoning Models by addressing internal and external redundancy, leading to more concise and interpretable reasoning traces without significant accuracy loss.
Large Reasoning Models (LRMs) often produce excessively verbose reasoning traces, a phenomenon known as overthinking, which hampers both efficiency and interpretability. Prior works primarily address this issue by reducing response length, without fully examining the underlying semantic structure of the reasoning process. In this paper, we revisit overthinking by decomposing it into two distinct forms: internal redundancy, which consists of low-contribution reasoning steps within the first correct solution (FCS), and external redundancy, which refers to unnecessary continuation after the FCS. To mitigate both forms, we propose a dual-penalty reinforcement learning framework. For internal redundancy, we adopt a sliding-window semantic analysis to penalize low-gain reasoning steps that contribute little toward reaching the correct answer. For external redundancy, we penalize its proportion beyond the FCS to encourage earlier termination. Our method significantly compresses reasoning traces with minimal accuracy loss, and generalizes effectively to out-of-domain tasks such as question answering and code generation. Crucially, we find that external redundancy can be safely removed without degrading performance, whereas internal redundancy must be reduced more cautiously to avoid impairing correctness. These findings suggest that our method not only improves reasoning efficiency but also enables implicit, semantic-aware control over Chain-of-Thought length, paving the way for more concise and interpretable LRMs.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Beyond Token Length: Step Pruner for Efficient and Accurate Reasoning in Large Language Models (2025)
- Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking (2025)
- Mitigating Overthinking through Reasoning Shaping (2025)
- PEAR: Phase Entropy Aware Reward for Efficient Reasoning (2025)
- DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models (2025)
- ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code (2025)
- Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper