mradermacher/Qwen3-14B-ARPO-DeepSearch-i1-GGUF Reinforcement Learning • 15B • Updated Aug 12 • 212 • 1