qwen3-8b-merge-openentry

By combining Qwen3-8B-Base (strong general language understanding) with DeepSeek-R1-0528-Qwen3-8B (powerful reasoning and code/math ability), this merge captures the best of both worlds.

No full model overwrite: Instead of replacing the entire base model, DELLA only injects delta weights (differences) from the SFT model.

Lighter than LoRA: LoRA adds extra parameters during inference. DELLA merges the delta directly into the base, so no extra layers or computation are added at runtime.

Faster than SFT: No supervised fine-tuning (SFT) is required. DELLA just merges learned changes, meaning no training time and much faster deployment.

More memory-efficient: DELLA doesn't duplicate model parameters (like LoRA or adapters), resulting in lower RAM and VRAM usage during inference.

Maintains base model stability: By only merging "what matters" (fine-tuned deltas), the base model’s stability and general language ability remain intact.

Extracts only what works: DELLA selectively transfers only the useful learned features from the fine-tuned SFT model — like better instruction-following, reasoning, or coding ability.

Merge Method

This model was merged using the DELLA merge method

Models Merged

The following models were included in the merge:

Test

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "./qwen3-8b-merge-openentry"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

Citation

If you find our work helpful, feel free to give us a cite.

OpenEntry Corp.

@misc{openentrymergereport,
  title        = {Merged DeepSeek R1 and Qwen3-8B-Base using DELLA},
  author       = {openentry},
  year         = {2025},
}

QWEN3

@misc{qwen3technicalreport,
  title        = {Qwen3 Technical Report},
  author       = {Qwen Team},
  year         = {2025},
  eprint       = {2505.09388},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  url          = {https://arxiv.org/abs/2505.09388}
}

DeepSeek-R1

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
  title        = {DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
  author       = {DeepSeek-AI},
  year         = {2025},
  eprint       = {2501.12948},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  url          = {https://arxiv.org/abs/2501.12948}
}

Contact

If you have any questions, please raise an issue or contact us at [email protected].

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for openentry/qwen3-8b-merge-openentry

Base model

Qwen/Qwen3-8B-Base
Finetuned
(258)
this model