Papers
arxiv:2510.10977

Revisiting Model Interpolation for Efficient Reasoning

Published on Oct 13
· Submitted by Taki WU on Oct 16
Authors:
,
,
,

Abstract

Model merging, typically on Instruct and Thinking models, has shown remarkable performance for efficient reasoning. In this paper, we systematically revisit the simplest merging method that interpolates two weights directly. Particularly, we observe that model interpolation follows a three-stage evolutionary paradigm with distinct behaviors on the reasoning trajectory. These dynamics provide a principled guide for navigating the performance-cost trade-off. Empirical results demonstrate that a strategically interpolated model surprisingly surpasses sophisticated model merging baselines on both efficiency and effectiveness. We further validate our findings with extensive ablation studies on model layers, modules, and decoding strategies. Ultimately, this work demystifies model interpolation and offers a practical framework for crafting models with precisely targeted reasoning capabilities. Code is available at https://github.com/wutaiqiang/MI{Github}.

Community

Paper author Paper submitter

Revisit the simple(est) method for efficient reasoning: Model Interpolation on Instruct and Thinking variants. This paper finds that model interpolation follows a three-stage evolutionary paradigm with distinct behaviors on the reasoning trajectory.

Code: https://github.com/wutaiqiang/MI

Paper author Paper submitter

image

Stage #1. Corresponding to λ ∈ [0, 0.4) for the Qwen3-4B model. In this initial phase, the merged model is dominated by the Instruct model but begins to incorporate traits from the Thinking model, thus generating longer outputs without adopting an explicit thinking process.

Stage #2. Corresponding to λ ∈ [0.4, 0.6] for Qwen3-4B. In this stage, the reasoning pattern following Thinking models rapidly emerges, leading to largely increased Mean@k and gently increased Pass@k and Token #N. This stage marks a
critical and dramatic phase transition.

Stage #3. Corresponding to λ ∈ (0.6, 1.0] for Qwen3-4B. In this final stage, the merged model converges to the pure Thinking model, with continuously increasing Token #N and a slight change in Pass@k and Mean@k.

Paper author Paper submitter

image
Similar for Qwen3-30B-A3B

Paper author Paper submitter

image

If we skip the FFN, the ratio of generating decreases to near 0.
-> FFN is vital for thinking pattern!

very intresting findings!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.10977 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.10977 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.10977 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.