Co-rewarding is a novel self-supervised RL framework that improves training stability by seeking complementary supervision from another views.
-
TMLR-Group-HF/Co-rewarding-RephrasedMATH
Viewer • Updated • 7.5k • 73 -
TMLR-Group-HF/Co-rewarding-I-Qwen2.5-3B-MATH
Text Generation • 3B • Updated • 15 • 1 -
TMLR-Group-HF/Co-rewarding-I-Qwen2.5-7B-MATH
Text Generation • 8B • Updated • 24 • 1 -
TMLR-Group-HF/Co-rewarding-I-Qwen3-1.7B-Base-MATH
Text Generation • 2B • Updated • 18