DFT
Collection
6 items
•
Updated
•
2
This model was presented in the paper On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
Code: https://github.com/yongliang-wu/DFT?tab=readme-ov-file