YOYO-AI
/

Qwen3-30B-A3B-YOYO-V3

Text Generation

Model card Files Files and versions

YOYO-AI commited on Sep 12

Commit

a1dc77b

·

verified ·

1 Parent(s): 25d6efa

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ tags:
 - ***precision**: `dtype: bfloat16`*
-- ***Context length**: `262144`*
 # *Parameter Settings:*
 ## *Non-Thinking Mode: (`set thinking = false`)*
@@ -49,9 +49,9 @@ tokenizer_source: base
 dtype: bfloat16
 name: Qwen3-30B-A3B-Mixture-V2
 ```
-## *Step3: Incorporating Code Model*
 - *After numerous attempts, we unexpectedly discovered that in the scenario of merging code models, the nearswap merging method performs exceptionally well—it not only outperforms all previous baseline methods but also further improves code performance while preserving the characteristics of the base model.*
-- *To ensure optimal performance, we have chosen to use a native 256K context. Of course, you can click [here](https://huggingface.co/YOYO-AI/Qwen3-30B-A3B-YOYO-V2/tree/main) to download the config.json that supports a 1M context.*
 ```yaml
 models:
   - model: Qwen/Qwen3-Coder-30B-A3B-Instruct

 - ***precision**: `dtype: bfloat16`*
+- ***Context length**: `1010000`*
 # *Parameter Settings:*
 ## *Non-Thinking Mode: (`set thinking = false`)*
 dtype: bfloat16
 name: Qwen3-30B-A3B-Mixture-V2
 ```
+## *Step3: Incorporating Code Model and Adjusting Context Length*
 - *After numerous attempts, we unexpectedly discovered that in the scenario of merging code models, the nearswap merging method performs exceptionally well—it not only outperforms all previous baseline methods but also further improves code performance while preserving the characteristics of the base model.*
+- *By referring to the config_1m.json of Qwen3-30B-A3B-Instruct-2507, we modified the config.json of the merged model and extended the maximum context length to 1M.*
 ```yaml
 models:
   - model: Qwen/Qwen3-Coder-30B-A3B-Instruct