Model Card for Switch Generation

This repository contains the Switch Generation framework and its associated switcher models, as presented in the paper Don't Throw Away Your Pretrained Model.

Model Details

Model Description

Switch Generation proposes an effective model collaboration framework designed to leverage the strengths of both pretrained and aligned language models. It addresses the tradeoffs of alignment training—where aligned models excel in reasoning and instruction following but might lose creativity and calibration—by allowing pretrained and aligned model versions to dynamically take turns to "speak" in a response sequence.

Specifically, the framework trains a "switcher LM" by learning from outcomes of choosing different models to generate the next segment across diverse queries and contexts. At inference time, this switcher LM guides different model checkpoints to dynamically generate the next segment where their strengths are most needed. Extensive experiments show that this model collaboration consistently outperforms individual models, with Switch Generation further improving performance significantly. The approach discovers compositional skills to solve complex problems and generalizes to unseen models and tasks, effectively reusing and repurposing by-products from expensive model training.

Developed by: Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang
Model type: Causal Language Model (LoRA adapter) within a Mixture of Experts (MoE) text generation framework.
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: allenai/Llama-3.1-Tulu-3-8B

Model Sources

Repository: https://github.com/BunsenFeng/switch_generation
Paper: https://huggingface.co/papers/2510.09913

Uses

Direct Use

Switch Generation is intended for accelerating and enhancing text generation by combining the strengths of multiple language models. It can be used to generate responses that benefit from the instruction-following capabilities of aligned models and the creativity or calibration of unaligned base models. The switcher LM orchestrates this dynamic collaboration.

Out-of-Scope Use

This model is not intended for standalone use as a general-purpose text generation model without the full Switch Generation framework, which involves multiple candidate models. Misuse without proper integration into the system may lead to suboptimal performance. Users should also be aware of potential biases inherited from the underlying foundation models used in the collaboration.

How to Get Started with the Model

To get started with the Switch Generation framework, follow the "Quick Start" instructions from the official GitHub repository.

Initialization

Create a conda environment for Switch Generation

conda env create -f switch.yml
conda activate switch_generation

Log into huggingface (for model access).

huggingface-cli login

Execute your first Switch Generation inference

bash main.sh

main.sh by default contains:

python main_generate.py \
        --input data/input_sample.jsonl \
        --gpu_ids 0,1,2,3 \
        --overide_selector_path bunsenfeng/PFA_switcher_1 \
        --total_max_length 256

--input: a JSONL file of inputs, look at data/input_sample.jsonl for an example of how to prepare your custom inputs. Output will come out at the same directory data/input_sample_switch_generation.jsonl.

--gpu_ids: a string of numbers separated by comma, 4 GPUs needed (one for P, F, A, and switcher each).

--overide_selector_path: path to the switcher LM on Huggingface. We provide bunsenfeng/PFA_switcher_1, bunsenfeng/PFA_switcher_2 with different task and training exposure, you can also just try the aligned model itself allenai/Llama-3.1-Tulu-3-8B or any model that could follow instructions.

--total_max_length: essentially max_new_tokens.

Other Settings

Your own data: format it like data/input_sample.jsonl.

Your own candidate models: change in lines 46-48 in main_generate.py. Make sure --gpu_ids provides (n+1) GPU ids where n is the amount of candidate models. Can be other than 3 models. Another recommended set: ["Qwen/Qwen2.5-7B", "bunsenfeng/yuru_qw_oasst1", "Qwen/Qwen2.5-7B-Instruct"], where the middle is an SFT model we made in here.

What's pending: code for switcher training, code for evals in the paper, compatibility such as fewer GPUs than n+1, etc.

Training Details

Training Procedure

The Switch Generation framework involves training a "switcher LM." This switcher LM learns by observing and learning from the outcomes of choosing different models to generate subsequent segments of text across a diverse range of queries and contexts. This process allows the switcher to dynamically identify and leverage the strengths of various models in real-time.

Evaluation

Extensive experiments were conducted with 8 model collaboration baselines and 18 datasets. The key findings are:

Model collaboration consistently outperforms individual models on 16 out of 18 tasks.
Switch Generation further outperforms baselines by 12.9% on average. Further analysis reveals that Switch Generation discovers compositional skills to solve problems where individual models struggle and generalizes to unseen models and tasks, reusing and repurposing by-products in expensive model training pipelines that are otherwise discarded.

Citation

If Switch Generation is helpful to you, please consider citing the paper:

@article{li2025dont,
  title={{Don't Throw Away Your Pretrained Model}},
  author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
  journal={arXiv preprint arXiv:2510.09913},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track