YOYO-AI commited on
Commit
a1dc77b
·
verified ·
1 Parent(s): 25d6efa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -18,7 +18,7 @@ tags:
18
 
19
  - ***precision**: `dtype: bfloat16`*
20
 
21
- - ***Context length**: `262144`*
22
 
23
  # *Parameter Settings:*
24
  ## *Non-Thinking Mode: (`set thinking = false`)*
@@ -49,9 +49,9 @@ tokenizer_source: base
49
  dtype: bfloat16
50
  name: Qwen3-30B-A3B-Mixture-V2
51
  ```
52
- ## *Step3: Incorporating Code Model*
53
  - *After numerous attempts, we unexpectedly discovered that in the scenario of merging code models, the nearswap merging method performs exceptionally well—it not only outperforms all previous baseline methods but also further improves code performance while preserving the characteristics of the base model.*
54
- - *To ensure optimal performance, we have chosen to use a native 256K context. Of course, you can click [here](https://huggingface.co/YOYO-AI/Qwen3-30B-A3B-YOYO-V2/tree/main) to download the config.json that supports a 1M context.*
55
  ```yaml
56
  models:
57
  - model: Qwen/Qwen3-Coder-30B-A3B-Instruct
 
18
 
19
  - ***precision**: `dtype: bfloat16`*
20
 
21
+ - ***Context length**: `1010000`*
22
 
23
  # *Parameter Settings:*
24
  ## *Non-Thinking Mode: (`set thinking = false`)*
 
49
  dtype: bfloat16
50
  name: Qwen3-30B-A3B-Mixture-V2
51
  ```
52
+ ## *Step3: Incorporating Code Model and Adjusting Context Length*
53
  - *After numerous attempts, we unexpectedly discovered that in the scenario of merging code models, the nearswap merging method performs exceptionally well—it not only outperforms all previous baseline methods but also further improves code performance while preserving the characteristics of the base model.*
54
+ - *By referring to the config_1m.json of Qwen3-30B-A3B-Instruct-2507, we modified the config.json of the merged model and extended the maximum context length to 1M.*
55
  ```yaml
56
  models:
57
  - model: Qwen/Qwen3-Coder-30B-A3B-Instruct