--- license: mit host_model: - toksuite/Qwen-Qwen3-8B - toksuite/meta-llama-Llama-3.2-1B tags: - merge - parameter-averaging - flexitok --- # Merged Model: llama_onto_qwen_lambda-0.5 This model is a result of **parameter averaging** (Model Soup) across 2 models. ### Merged Models The following models were included in the merge: - toksuite/Qwen-Qwen3-8B - toksuite/meta-llama-Llama-3.2-1B ### Merging Configuration - **Method**: Weighted Parameter Averaging - **Weights**: Simple average with merging lambda = 0.5. - **Excluded Layers**: Embeddings and LM Head were kept from the host model (toksuite/Qwen-Qwen3-8B). ### Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("flexitok/llama_onto_qwen_lambda-0.5") tokenizer = AutoTokenizer.from_pretrained("flexitok/llama_onto_qwen_lambda-0.5")