Liang0223
/

Qwen-2.5-Math-1.5B-DPO

Text Generation

text-generation-inference

Model card Files Files and versions

This model was presented in the paper On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.

Code: https://github.com/yongliang-wu/DFT?tab=readme-ov-file

Downloads last month: 17

Safetensors

Model size

2B params

Tensor type

BF16

·

Collection including Liang0223/Qwen-2.5-Math-1.5B-DPO

DFT

6 items • Updated about 1 month ago • 2