Abstract
Model merging, typically on Instruct and Thinking models, has shown remarkable performance for efficient reasoning. In this paper, we systematically revisit the simplest merging method that interpolates two weights directly. Particularly, we observe that model interpolation follows a three-stage evolutionary paradigm with distinct behaviors on the reasoning trajectory. These dynamics provide a principled guide for navigating the performance-cost trade-off. Empirical results demonstrate that a strategically interpolated model surprisingly surpasses sophisticated model merging baselines on both efficiency and effectiveness. We further validate our findings with extensive ablation studies on model layers, modules, and decoding strategies. Ultimately, this work demystifies model interpolation and offers a practical framework for crafting models with precisely targeted reasoning capabilities. Code is available at https://github.com/wutaiqiang/MI{Github}.
Community
Revisit the simple(est) method for efficient reasoning: Model Interpolation on Instruct and Thinking variants. This paper finds that model interpolation follows a three-stage evolutionary paradigm with distinct behaviors on the reasoning trajectory.
Stage #1. Corresponding to λ ∈ [0, 0.4) for the Qwen3-4B model. In this initial phase, the merged model is dominated by the Instruct model but begins to incorporate traits from the Thinking model, thus generating longer outputs without adopting an explicit thinking process.
Stage #2. Corresponding to λ ∈ [0.4, 0.6] for Qwen3-4B. In this stage, the reasoning pattern following Thinking models rapidly emerges, leading to largely increased Mean@k and gently increased Pass@k and Token #N. This stage marks a
critical and dramatic phase transition.
Stage #3. Corresponding to λ ∈ (0.6, 1.0] for Qwen3-4B. In this final stage, the merged model converges to the pure Thinking model, with continuously increasing Token #N and a slight change in Pass@k and Mean@k.
very intresting findings!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- The Thinking Spectrum: An Empirical Study of Tunable Reasoning in LLMs through Model Merging (2025)
- ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code (2025)
- ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization (2025)
- Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking (2025)
- TORSO: Template-Oriented Reasoning Towards General Tasks (2025)
- A2R: An Asymmetric Two-Stage Reasoning Framework for Parallel Reasoning (2025)
- Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper


