stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 3
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 1
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13 • 1
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 1
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 1
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 6
stellalisy/rethink_rlvr_reproduce-random-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 4
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13 • 6
stellalisy/rethink_rlvr_reproduce-random-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 1
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 44
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13