SWE-bench Verified Repro

#30
by vim-ary - opened

Both Qwen3-Coder-Flash and Qwen3-Coder have outstanding out-of-the-box scores on the SWE task.

Can you share the details of how the score of 51.6 was obtained for Qwen3-Coder-30B-A3B-Instruct on SWE-bench Verified with Openhands (100 turns) scaffolding?

OpenHands repository is an evolving library which has many releases and each release highly impacts final metrics.

  • Which commit did you use?
  • Can you provide the whole config.toml and cli command?

Qwen3-Coder-30B-A3B-Instruct has also flexibility in its configuration:

  • Which vllm version did you deploy it with? Ideally would be nice to have the full vllm serve command
  • Did you use old 'qwen_xml' tool call parser and the more recent 'qwen_coder'? Results vary between parsers
  • Which max_seq_len did you use?

Big thank you in advance

Sign up or log in to comment