metadata
license: apache-2.0
language:
- en
- zh
base_model:
- deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
- Qwen/Qwen3-8B
pipeline_tag: text-generation
tags:
- merge
The Karcher merge method does not require the use of a base model. Click here for details.
Model Highlights:
merge method:
karcherHighest precision:
dtype: float32+out_dtype: bfloat16Brand-new chat template: ensures normal operation on LM Studio
Context length:
131072
Model Selection Table:
| Model | Context | Uses Basic Model |
|---|---|---|
| Qwen3-8B-YOYO-karcher | 32K | NO |
| Qwen3-8B-YOYO-karcher-128K | 128K | NO |
| Qwen3-EZO-8B-YOYO-karcher | 32K | NO |
| Qwen3-EZO-8B-YOYO-karcher-128K | 128K | NO |
Warning: Models with
128Kcontext may have slight quality loss. In most cases, please use the32Knative context!
Parameter Settings:
Thinking Mode:
Temperature=0.6,TopP=0.95,TopK=20,MinP=0.
Configuration:
The following YAML configuration was used to produce this model:
models:
- model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
- model: Qwen/Qwen3-8B
merge_method: karcher
parameters:
max_iter: 1000
dtype: float32
out_dtype: bfloat16
tokenizer_source: Qwen/Qwen3-8B