Model Summary

UnifiedReward-Think-qwen3vl-8b is the first unified multimodal CoT reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks.

For further details, please refer to the following resources:

πŸš€ All inference code is provided at Github.

Citation

@article{unifiedreward-think,
  title={Unified multimodal chain-of-thought reward model through reinforcement fine-tuning},
  author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Wang, Chunyu and Lu, Qinglin and Jin, Cheng and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2505.03318},
  year={2025}
}
Downloads last month
114
Safetensors
Model size
9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for CodeGoat24/UnifiedReward-Think-qwen3vl-8b

Finetuned
(2)
this model
Quantizations
2 models

Collection including CodeGoat24/UnifiedReward-Think-qwen3vl-8b