Model Card for Switch Generation
This repository contains the Switch Generation framework and its associated switcher models, as presented in the paper Don't Throw Away Your Pretrained Model.
Model Details
Model Description
Switch Generation proposes an effective model collaboration framework designed to leverage the strengths of both pretrained and aligned language models. It addresses the tradeoffs of alignment training—where aligned models excel in reasoning and instruction following but might lose creativity and calibration—by allowing pretrained and aligned model versions to dynamically take turns to "speak" in a response sequence.
Specifically, the framework trains a "switcher LM" by learning from outcomes of choosing different models to generate the next segment across diverse queries and contexts. At inference time, this switcher LM guides different model checkpoints to dynamically generate the next segment where their strengths are most needed. Extensive experiments show that this model collaboration consistently outperforms individual models, with Switch Generation further improving performance significantly. The approach discovers compositional skills to solve complex problems and generalizes to unseen models and tasks, effectively reusing and repurposing by-products from expensive model training.
- Developed by: Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang
- Model type: Causal Language Model (LoRA adapter) within a Mixture of Experts (MoE) text generation framework.
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model:
allenai/Llama-3.1-Tulu-3-8B
Model Sources
- Repository: https://github.com/BunsenFeng/switch_generation
- Paper: https://huggingface.co/papers/2510.09913
Uses
Direct Use
Switch Generation is intended for accelerating and enhancing text generation by combining the strengths of multiple language models. It can be used to generate responses that benefit from the instruction-following capabilities of aligned models and the creativity or calibration of unaligned base models. The switcher LM orchestrates this dynamic collaboration.
Out-of-Scope Use
This model is not intended for standalone use as a general-purpose text generation model without the full Switch Generation framework, which involves multiple candidate models. Misuse without proper integration into the system may lead to suboptimal performance. Users should also be aware of potential biases inherited from the underlying foundation models used in the collaboration.
How to Get Started with the Model
To get started with the Switch Generation framework, follow the "Quick Start" instructions from the official GitHub repository.
Initialization
Create a conda environment for Switch Generation
conda env create -f switch.yml
conda activate switch_generation
Log into huggingface (for model access).
huggingface-cli login
Execute your first Switch Generation inference
bash main.sh
main.sh by default contains:
python main_generate.py \
--input data/input_sample.jsonl \
--gpu_ids 0,1,2,3 \
--overide_selector_path bunsenfeng/PFA_switcher_1 \
--total_max_length 256
--input: a JSONL file of inputs, look at data/input_sample.jsonl for an example of how to prepare your custom inputs. Output will come out at the same directory data/input_sample_switch_generation.jsonl.
--gpu_ids: a string of numbers separated by comma, 4 GPUs needed (one for P, F, A, and switcher each).
--overide_selector_path: path to the switcher LM on Huggingface. We provide bunsenfeng/PFA_switcher_1, bunsenfeng/PFA_switcher_2 with different task and training exposure, you can also just try the aligned model itself allenai/Llama-3.1-Tulu-3-8B or any model that could follow instructions.
--total_max_length: essentially max_new_tokens.
Other Settings
Your own data: format it like data/input_sample.jsonl.
Your own candidate models: change in lines 46-48 in main_generate.py. Make sure --gpu_ids provides (n+1) GPU ids where n is the amount of candidate models. Can be other than 3 models. Another recommended set: ["Qwen/Qwen2.5-7B", "bunsenfeng/yuru_qw_oasst1", "Qwen/Qwen2.5-7B-Instruct"], where the middle is an SFT model we made in here.
What's pending: code for switcher training, code for evals in the paper, compatibility such as fewer GPUs than n+1, etc.
Training Details
Training Procedure
The Switch Generation framework involves training a "switcher LM." This switcher LM learns by observing and learning from the outcomes of choosing different models to generate subsequent segments of text across a diverse range of queries and contexts. This process allows the switcher to dynamically identify and leverage the strengths of various models in real-time.
Evaluation
Extensive experiments were conducted with 8 model collaboration baselines and 18 datasets. The key findings are:
- Model collaboration consistently outperforms individual models on 16 out of 18 tasks.
- Switch Generation further outperforms baselines by 12.9% on average. Further analysis reveals that Switch Generation discovers compositional skills to solve problems where individual models struggle and generalizes to unseen models and tasks, reusing and repurposing by-products in expensive model training pipelines that are otherwise discarded.
Citation
If Switch Generation is helpful to you, please consider citing the paper:
@article{li2025dont,
title={{Don't Throw Away Your Pretrained Model}},
author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
journal={arXiv preprint arXiv:2510.09913},
year={2025}
}