This model fills the gap that the Qwen 2507 series lacks hybrid models, and fully combines the advantages of the instruction, reasoning, and coding models.
Model Highlights:
merge method:
nuslerpnearswapprecision:
dtype: bfloat16Context length:
1010000
Parameter Settings:
Non-Thinking Mode: (set thinking = false)
Temperature=0.7,TopP=0.8,TopK=20,MinP=0.
Thinking Mode One (set thinking = false)
- Use the following prompt:
Enable deep thinking subroutine.
Thinking Mode Two (set thinking = true)
- No prompt
Temperature=0.6,TopP=0.95,TopK=20,MinP=0.
Step1: Chat Template for Unified Instruct Model and Thinking Model
- Inspired by Deepseek V3.1, we have unified the chat templates of Qwen3-2507-Instruct and Qwen3-2507-Thinking.
- Now, you can switch between No-Thinking and Thinking modes by modifying
set thinkingtofalseortrue!
{%- if not thinking is defined %}{% set thinking = false %}{% endif %}\n{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {%- if thinking %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {%- if thinking %}\n {{- '<|im_start|>assistant\\n<think>\\n' }}\n {%- else %}\n {{- '<|im_start|>assistant\\n' }}\n {%- endif %}\n{%- endif %}
Step2: Hybrid Instruct Model and Thinking Model
- We have improved the mixing ratio of Qwen3-30B-A3B-Mixture-2507, enabling the model to switch smoothly between thinking and non-thinking modes!
models:
- model: Qwen/Qwen3-30B-A3B-Thinking-2507
parameters:
weight: 1.4
- model: Qwen/Qwen3-30B-A3B-Instruct-2507
parameters:
weight: 0.6
merge_method: nuslerp
tokenizer_source: base
dtype: bfloat16
name: Qwen3-30B-A3B-Mixture-V2
Step3: Incorporating Code Model and Adjusting Context Length
- After numerous attempts, we unexpectedly discovered that in the scenario of merging code models, the nearswap merging method performs exceptionally well—it not only outperforms all previous baseline methods but also further improves code performance while preserving the characteristics of the base model.
- By referring to the config_1m.json of Qwen3-30B-A3B-Instruct-2507, we modified the config.json of the merged model and extended the maximum context length to 1M.
models:
- model: Qwen/Qwen3-Coder-30B-A3B-Instruct
merge_method: nearswap
base_model: Qwen3-30B-A3B-Mixture-V2
parameters:
t:
- value: 0.001
tokenizer_source: base
dtype: bfloat16
Model merging may inevitably result in performance degradation: even in non-thinking mode, the model might produce lengthy reasoning chains when faced with challenging problems; in thinking mode, it may output reasoning chains without final summaries.
We are committed to optimizing this in the next version and encourage developers to fine-tune this model or use it as a base for distilling larger models to further enhance performance.
- Downloads last month
- 28