YOYO-AI
/

Qwen3-8B-YOYO-karcher-128K

Text Generation

Model card Files Files and versions

Qwen3-8B-YOYO-karcher-128K / README.md

YOYO-AI's picture

Update README.md

ff701a6 verified 4 months ago

|

history blame contribute delete

1.59 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	base_model:
	- deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
	- Qwen/Qwen3-8B
	pipeline_tag: text-generation
	tags:
	- merge
	---
	> [!TIP]
	> The Karcher merge method does not require the use of a base model. Click [here](https://github.com/arcee-ai/mergekit/blob/main/docs/merge_methods.md#karcher-mean-karcher) for details.

	# Model Highlights:

	- *merge method: `karcher`*

	- *Highest precision: `dtype: float32` + `out_dtype: bfloat16`*

	- *Brand-new chat template: ensures normal operation on LM Studio*

	- *Context length: `131072`*
	## Model Selection Table:
	\|Model\|Context\|Uses Basic Model\|
	\|---\|---\|---\|
	\|[Qwen3-8B-YOYO-karcher](https://huggingface.co/YOYO-AI/Qwen3-8B-YOYO-karcher)\|32K\|NO\|
	\|[Qwen3-8B-YOYO-karcher-128K](https://huggingface.co/YOYO-AI/Qwen3-8B-YOYO-karcher-128K)\|128K\|NO\|
	\|[Qwen3-EZO-8B-YOYO-karcher](https://huggingface.co/YOYO-AI/Qwen3-EZO-8B-YOYO-karcher)\|32K\|NO\|
	\|[Qwen3-EZO-8B-YOYO-karcher-128K](https://huggingface.co/YOYO-AI/Qwen3-EZO-8B-YOYO-karcher-128K)\|128K\|NO\|
	> Warning:
	> Models with `128K` context may have slight quality loss. In most cases, please use the `32K` native context!
	# Parameter Settings:
	## Thinking Mode:
	> [!NOTE]
	> `Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`.

	# Configuration:
	The following YAML configuration was used to produce this model:

	```yaml
	models:
	- model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
	- model: Qwen/Qwen3-8B
	merge_method: karcher
	parameters:
	max_iter: 1000
	dtype: float32
	out_dtype: bfloat16
	tokenizer_source: Qwen/Qwen3-8B
	```