--- license: apache-2.0 language: - en - zh base_model: - Qwen/Qwen3-30B-A3B-Thinking-2507 - Qwen/Qwen3-30B-A3B-Instruct-2507 - Qwen/Qwen3-Coder-30B-A3B-Instruct pipeline_tag: text-generation tags: - merge --- > *This model fills the gap that the Qwen 2507 series lacks hybrid models, and fully combines the advantages of the instruction, reasoning, and coding models. It is an excellent choice for your local model deployment!* # *Model Highlights:* - ***merge method**: `nuslerp` `nearswap`* - ***precision**: `dtype: bfloat16`* - ***Context length**: `1010000`* # *Parameter Settings:* ## *Non-Thinking Mode: (`set thinking = false`)* > [!TIP] > *`Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`.* ## *Thinking Mode: (`set thinking = true`)* > [!NOTE] > *`Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`.* ## *Step1: Chat Template for Unified Instruct Model and Thinking Model* - *Inspired by Deepseek V3.1, we have unified the chat templates of Qwen3-2507-Instruct and Qwen3-2507-Thinking.* - *Now, you can switch between **No-Thinking** and **Thinking** modes by modifying `set thinking` to `false` or `true`!* ``` {%- if not thinking is defined %}{% set thinking = false %}{% endif %}\n{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within XML tags:\\n\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n\\n\\nFor each function call, return a json object with function name and arguments within XML tags:\\n\\n{\\\"name\\\": , \\\"arguments\\\": }\\n<|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('') and message.content.endswith('')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '' in content %}\n {%- set reasoning_content = content.split('')[0].rstrip('\\n').split('')[-1].lstrip('\\n') %}\n {%- set content = content.split('')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {%- if thinking %}\n {{- '<|im_start|>' + message.role + '\\n\\n' + reasoning_content.strip('\\n') + '\\n\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n\\n' + reasoning_content.strip('\\n') + '\\n\\n\\n' + content.lstrip('\\n') }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n\\n' }}\n {{- content }}\n {{- '\\n' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {%- if thinking %}\n {{- '<|im_start|>assistant\\n\\n' }}\n {%- else %}\n {{- '<|im_start|>assistant\\n' }}\n {%- endif %}\n{%- endif %} ``` ## *Step2: Hybrid Instruct Model and Thinking Model* - *We have improved the mixing ratio of Qwen3-30B-A3B-Mixture-2507, enabling the model to switch smoothly between thinking and non-thinking modes!* ```yaml models: - model: Qwen/Qwen3-30B-A3B-Thinking-2507 parameters: weight: 1.4 - model: Qwen/Qwen3-30B-A3B-Instruct-2507 parameters: weight: 0.6 merge_method: nuslerp tokenizer_source: base dtype: bfloat16 name: Qwen3-30B-A3B-Mixture-V2 ``` ## *Step3: Incorporating Code Model and Adjusting Context Length* - *After numerous attempts, we unexpectedly discovered that in the scenario of merging code models, the nearswap merging method performs exceptionally well—it not only outperforms all previous baseline methods but also further improves code performance while preserving the characteristics of the base model.* - *By referring to the config_1m.json of Qwen3-30B-A3B-Instruct-2507, we modified the config.json of the merged model and extended the maximum context length to 1M.* ```yaml models: - model: Qwen/Qwen3-Coder-30B-A3B-Instruct merge_method: nearswap base_model: Qwen3-30B-A3B-Mixture-V2 parameters: t: - value: 0.001 tokenizer_source: base dtype: bfloat16 ``` As a result, we have successfully developed a powerful local coding model with dual modes. Finally, we hope it will serve as a small model base for distilling Deepseek V3.1!