ZeroXClem/Qwen2.5-7B-DistilPrism

Qwen2.5-7B-DistilPrism is a distillation / reasoning focused model merge designed to combine multiple variations of DeepSeek-R1 distillations, resulting in a refined, high-performance language model. Utilizing the Model Stock merge method, this fusion captures the best attributes of DeepSeek-R1-Distill-Qwen-7B and its improved derivatives.

🚀 Merged Models

This model is a weighted merge of the following:

huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2: An uncensored distillation of DeepSeek-R1, optimized to remove refusals and improve usability.
mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1: A refined distillation that improves accuracy and robustness across various benchmarks.
Triangle104/DSR1-Distill-Qwen-7B-RP: A composite merge of various distilled DeepSeek variants, serving as an essential ingredient for performance tuning.
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B: The foundation of this merge, representing the distilled form of DeepSeek-R1 optimized for efficiency and strong reasoning capabilities.

🧩 Merge Configuration

The following YAML configuration defines how these models were combined using Model Stock, ensuring balanced contributions from each source:

# Merge configuration for ZeroXClem/Qwen2.5-7B-DistilPrism using Model Stock
name: ZeroXClem-Qwen2.5-7B-DistilPrism
merge_method: model_stock
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
tokenizer_source: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
dtype: bfloat16
parameters:
  normalize: true
  rescale: true
models:
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2
    parameters:
      weight: 0.3
  - model: mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
    parameters:
      weight: 0.25
  - model: Triangle104/DSR1-Distill-Qwen-7B-RP
    parameters:
      weight: 0.2
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
    parameters:
      weight: 0.25

🔑 Key Parameters

Normalization & Rescaling: Ensures weight distributions remain balanced across all components.
Model Stock Merge Method: Optimizes contribution from each model to retain the best attributes.
Weighted Blending: The abliterated and re-distilled models contribute the most, refining both alignment and general usability.

🗣️ Inference

You can use the model for text generation as follows:

Ollama

Quickstart to Ollama Guide Here I recommend ollama for daily driver applications, as it supports thinkking tags.

ollama run hf.co/ZeroXClem/Qwen2.5-7B-DistilPrism

# If you are using quants, just copy the url and replace 'huggingface.co/' with 'hf.co/' followed by name of quant.

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

# Define the model name
model_name = "ZeroXClem/Qwen2.5-7B-DistilPrism"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Initialize the pipeline
text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Define the input prompt
prompt = "Explain the significance of artificial intelligence in modern healthcare."

# Generate the output
outputs = text_generator(
    prompt,
    max_new_tokens=150,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)

# Print the generated text
print(outputs[0]["generated_text"])

🎯 Use Case & Applications

Qwen2.5-7B-DistilPrism is designed for efficient, high-quality text generation with strong reasoning capabilities. It is well-suited for:

Advanced Reasoning & Problem Solving: Excels in logic-heavy tasks and multi-step reasoning problems.
Conversational AI: Optimized for fluid, responsive dialogue, reducing refusals and improving engagement.
Mathematical & Scientific Computation: Enhanced math & code generation abilities compared to standard distillations.
Content Creation & Summarization: Generates coherent and contextually rich text suitable for various applications.

📜 License

This model is released under the MIT License.

📊 Benchmark Results (Coming Soon)

We are currently in the process of quantizing and benchmarking this model. Stay tuned for performance updates across:

IFEval (0-Shot)
BBH (3-Shot)
MATH (4-Shot)
GPQA (0-Shot)
MuSR (0-Shot)
MMLU-PRO (5-Shot)

🙏 Special Thanks

This project wouldn't be possible without the incredible contributions from:

@huihui-ai – For developing DeepSeek-R1-Distill-Qwen-7B-abliterated-v2, a bold step towards improving model alignment.
@mobiuslabsgmbh – For refining distillation techniques with DeepSeek-R1-ReDistill-Qwen-7B-v1.1.
@Triangle104 – For crafting innovative merges like DSR1-Distill-Qwen-7B-RP, an essential component in this blend.
@deepseek-ai – For open-sourcing DeepSeek-R1-Distill-Qwen-7B, a foundation for reasoning advancements.