SWE-bench Verified Repro
#30
by
vim-ary
- opened
Both Qwen3-Coder-Flash and Qwen3-Coder have outstanding out-of-the-box scores on the SWE task.
Can you share the details of how the score of 51.6 was obtained for Qwen3-Coder-30B-A3B-Instruct on SWE-bench Verified with Openhands (100 turns) scaffolding?
OpenHands repository is an evolving library which has many releases and each release highly impacts final metrics.
- Which commit did you use?
- Can you provide the whole config.toml and cli command?
Qwen3-Coder-30B-A3B-Instruct has also flexibility in its configuration:
- Which vllm version did you deploy it with? Ideally would be nice to have the full
vllm servecommand - Did you use old 'qwen_xml' tool call parser and the more recent 'qwen_coder'? Results vary between parsers
- Which max_seq_len did you use?
Big thank you in advance