--- license: mit host_model: - toksuite/meta-llama-Llama-3.2-1B - toksuite/Qwen-Qwen3-8B tags: - merge - parameter-averaging - flexitok --- # Merged Model: qwen_onto_llama_lambda-0.5 This model is a result of **parameter averaging** (Model Soup) across 2 models. ### Merged Models The following models were included in the merge: - toksuite/meta-llama-Llama-3.2-1B - toksuite/Qwen-Qwen3-8B ### Merging Configuration - **Method**: Weighted Parameter Averaging - **Weights**: Simple average with merging lambda = 0.5. - **Excluded Layers**: Embeddings and LM Head were kept from the host model (toksuite/meta-llama-Llama-3.2-1B). ### Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("flexitok/qwen_onto_llama_lambda-0.5") tokenizer = AutoTokenizer.from_pretrained("flexitok/qwen_onto_llama_lambda-0.5")