Regarding the training details of caijanfeng/Qwen-2.5-7B-Simple-RL

by ZHHHR - opened Apr 28

Apr 28

May I ask whether the training of caijanfeng/Qwen-2.5-7B-Simple-RL was conducted purely with reinforcement learning (RL), or was it combined with supervised fine-tuning (SFT)? Additionally, what dataset was used for the RL training? Thank you very much!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment