UnifiedReward 2.0 Qwen3VL Models
Collection
9 items
β’
Updated
UnifiedReward-Think-qwen3vl-8b is the first unified multimodal CoT reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks.
For further details, please refer to the following resources:
π All inference code is provided at Github.
@article{unifiedreward-think,
title={Unified multimodal chain-of-thought reward model through reinforcement fine-tuning},
author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Wang, Chunyu and Lu, Qinglin and Jin, Cheng and Wang, Jiaqi},
journal={arXiv preprint arXiv:2505.03318},
year={2025}
}