|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
base_model: |
|
|
- deepseek-ai/DeepSeek-R1-0528-Qwen3-8B |
|
|
- Qwen/Qwen3-8B |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- merge |
|
|
--- |
|
|
> [!TIP] |
|
|
> The Karcher merge method does not require the use of a base model. Click [here](https://github.com/arcee-ai/mergekit/blob/main/docs/merge_methods.md#karcher-mean-karcher) for details. |
|
|
|
|
|
# *Model Highlights:* |
|
|
|
|
|
- ***merge method**: `karcher`* |
|
|
|
|
|
- ***Highest precision**: `dtype: float32` + `out_dtype: bfloat16`* |
|
|
|
|
|
- ***Brand-new chat template**: ensures normal operation on LM Studio* |
|
|
|
|
|
- ***Context length**: `131072`* |
|
|
## *Model Selection Table:* |
|
|
|Model|Context|Uses Basic Model| |
|
|
|---|---|---| |
|
|
|[Qwen3-8B-YOYO-karcher](https://huggingface.co/YOYO-AI/Qwen3-8B-YOYO-karcher)|32K|NO| |
|
|
|[Qwen3-8B-YOYO-karcher-128K](https://huggingface.co/YOYO-AI/Qwen3-8B-YOYO-karcher-128K)|128K|NO| |
|
|
|[Qwen3-EZO-8B-YOYO-karcher](https://huggingface.co/YOYO-AI/Qwen3-EZO-8B-YOYO-karcher)|32K|NO| |
|
|
|[Qwen3-EZO-8B-YOYO-karcher-128K](https://huggingface.co/YOYO-AI/Qwen3-EZO-8B-YOYO-karcher-128K)|128K|NO| |
|
|
> **Warning**: |
|
|
> *Models with `128K` context may have slight quality loss. In most cases, please use the `32K` native context!* |
|
|
# *Parameter Settings*: |
|
|
## *Thinking Mode:* |
|
|
> [!NOTE] |
|
|
> *`Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`.* |
|
|
|
|
|
# *Configuration*: |
|
|
*The following YAML configuration was used to produce this model:* |
|
|
|
|
|
```yaml |
|
|
models: |
|
|
- model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B |
|
|
- model: Qwen/Qwen3-8B |
|
|
merge_method: karcher |
|
|
parameters: |
|
|
max_iter: 1000 |
|
|
dtype: float32 |
|
|
out_dtype: bfloat16 |
|
|
tokenizer_source: Qwen/Qwen3-8B |
|
|
``` |
|
|
|
|
|
|