Rui Yang's picture

In a Training Loop 🔄

Rui Yang PRO

Ray2333

·

https://yangrui2015.github.io

YangRui2015

AI & ML interests

Deep Reinforcement Learning

Recent Activity

updated a collection 37 minutes ago

updated a collection 38 minutes ago

updated a collection 38 minutes ago

View all activity

Organizations

Collections 2

Papers 8

arxiv:2602.22190

arxiv:2510.27623

arxiv:2510.12693

arxiv:2506.03143

models 20

Ray2333/Qwen2.5-VL-3B-sft-reasoning_and_grounding_changecoord_cpt636

4B • Updated 41 minutes ago

Ray2333/Qwen2.5-VL-3B-weighted_sft_ratio2-reasoning_and_grounding_changecoord_mixnoreasoning_cpt636

4B • Updated 42 minutes ago

Ray2333/Qwen2.5-VL-7B-weighted_sft_ratio2-reasoning_and_grounding_changecoord_mixnoreasoning_cpt636

8B • Updated 42 minutes ago

Ray2333/Qwen2.5-VL-7B-sft-reasoning_and_grounding_changecoord_cpt636

8B • Updated 43 minutes ago

Ray2333/GUI-Libra-3B

4B • Updated Mar 1 • 2

Ray2333/GRM-Llama3.2-3B-rewardmodel-ft

Text Classification • 3B • Updated Apr 30, 2025 • 1.65k • 13

Ray2333/Gemma-2B-rewardmodel-baseline

Text Classification • 3B • Updated Feb 5, 2025 • 24 • 2

Ray2333/GRM-llama3-8B-distill

Text Classification • 8B • Updated Feb 5, 2025 • 40 • 6

Ray2333/reward-model-Mistral-7B-instruct-Unified-Feedback

Text Classification • 7B • Updated Feb 5, 2025 • 904 • 11

Ray2333/GRM-Gemma-2B-rewardmodel-ft

3B • Updated Feb 5, 2025 • 677 • 1

datasets 4

Ray2333/Libra-81K-SFT

Updated 25 days ago • 33

Ray2333/Offline_Evaluation

Viewer • Updated Feb 22 • 35.2k • 19

Ray2333/Libra-81K

Viewer • Updated Feb 20 • 738 • 49

Ray2333/RiC_harmless_helpful

Viewer • Updated Jul 12, 2024 • 291k • 16