Co-rewarding is a novel self-supervised RL framework that improves training stability by seeking complementary supervision from another views.
-
TMLR-Group-HF/Co-rewarding-RephrasedMATH
Viewer • Updated • 7.5k • 64 -
TMLR-Group-HF/Co-rewarding-I-Qwen2.5-3B-MATH
Text Generation • 3B • Updated • 14 • 1 -
TMLR-Group-HF/Co-rewarding-I-Qwen2.5-7B-MATH
Text Generation • 8B • Updated • 13 • 1 -
TMLR-Group-HF/Co-rewarding-I-Qwen3-1.7B-Base-MATH
Text Generation • 2B • Updated • 12