Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ tags:
|
|
| 18 |
|
| 19 |
- ***precision**: `dtype: bfloat16`*
|
| 20 |
|
| 21 |
-
- ***Context length**: `
|
| 22 |
|
| 23 |
# *Parameter Settings:*
|
| 24 |
## *Non-Thinking Mode: (`set thinking = false`)*
|
|
@@ -49,9 +49,9 @@ tokenizer_source: base
|
|
| 49 |
dtype: bfloat16
|
| 50 |
name: Qwen3-30B-A3B-Mixture-V2
|
| 51 |
```
|
| 52 |
-
## *Step3: Incorporating Code Model*
|
| 53 |
- *After numerous attempts, we unexpectedly discovered that in the scenario of merging code models, the nearswap merging method performs exceptionally well—it not only outperforms all previous baseline methods but also further improves code performance while preserving the characteristics of the base model.*
|
| 54 |
-
- *
|
| 55 |
```yaml
|
| 56 |
models:
|
| 57 |
- model: Qwen/Qwen3-Coder-30B-A3B-Instruct
|
|
|
|
| 18 |
|
| 19 |
- ***precision**: `dtype: bfloat16`*
|
| 20 |
|
| 21 |
+
- ***Context length**: `1010000`*
|
| 22 |
|
| 23 |
# *Parameter Settings:*
|
| 24 |
## *Non-Thinking Mode: (`set thinking = false`)*
|
|
|
|
| 49 |
dtype: bfloat16
|
| 50 |
name: Qwen3-30B-A3B-Mixture-V2
|
| 51 |
```
|
| 52 |
+
## *Step3: Incorporating Code Model and Adjusting Context Length*
|
| 53 |
- *After numerous attempts, we unexpectedly discovered that in the scenario of merging code models, the nearswap merging method performs exceptionally well—it not only outperforms all previous baseline methods but also further improves code performance while preserving the characteristics of the base model.*
|
| 54 |
+
- *By referring to the config_1m.json of Qwen3-30B-A3B-Instruct-2507, we modified the config.json of the merged model and extended the maximum context length to 1M.*
|
| 55 |
```yaml
|
| 56 |
models:
|
| 57 |
- model: Qwen/Qwen3-Coder-30B-A3B-Instruct
|