daixuancheng/ppo_sac_static0.1_constrainbyadv_step-200_actor Text Generation • 8B • Updated Jun 24, 2025 • 3
daixuancheng/rerun_qwen-math-7b_noSuffix_base_step280 Text Generation • 8B • Updated Jun 24, 2025 • 3
daixuancheng/fix-entropy-1e-3_train_math_global_step_180 Text Generation • 8B • Updated Jun 24, 2025 • 3
daixuancheng/rerun_sac-init0.1_qwen-math-7b_constrainbyAdv_step260 Text Generation • 8B • Updated Jun 24, 2025 • 3
daixuancheng/rerun_qwen-math-7b_noSuffix_base_step260 Text Generation • 8B • Updated Jun 24, 2025 • 3
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step180_crtic Text Generation • 8B • Updated Jun 24, 2025 • 5
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step180_actor Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/zero_qwen-math-7b_base_allDapo_mathVerify_yesSuffix_step220 Text Generation • 8B • Updated Jun 24, 2025 • 3
daixuancheng/sac-init0.4_qwen-math-7b_constrainbyAdv_yesSuffix_step220 Text Generation • 8B • Updated Jun 24, 2025 • 3
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step160 Text Generation • 8B • Updated Jun 24, 2025 • 4
daixuancheng/sac_static0.4_constrainbyAdv_step160_bak Text Generation • 8B • Updated Jun 24, 2025 • 3
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-180_critic Text Generation • 8B • Updated Jun 24, 2025 • 4
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-180_actor Text Generation • 8B • Updated Jun 23, 2025 • 4
daixuancheng/fix-entropy-1e-3_train_math_global_step_160 Text Generation • 8B • Updated Jun 23, 2025 • 3
daixuancheng/fix-entropy-1e-3_train_math_global_step_140 Text Generation • 8B • Updated Jun 23, 2025 • 3