GUI-Libra Training GUI agents with augmented reasoning data and a tailored post-training recipe Ray2333/GUI-Libra-3B 4B • Updated Mar 1 • 2 Ray2333/Libra-81K-SFT Updated 25 days ago • 33 Ray2333/Offline_Evaluation Viewer • Updated Feb 22 • 35.2k • 19 Ray2333/Libra-81K Viewer • Updated Feb 20 • 738 • 49
GRM Generalizable Reward Models Ray2333/GRM-llama3-8B-sftreg Text Classification • 8B • Updated Feb 5, 2025 • 10 • 5 Ray2333/GRM-llama3-8B-distill Text Classification • 8B • Updated Feb 5, 2025 • 40 • 6 Ray2333/GRM-Gemma-2B-sftreg Text Classification • 3B • Updated Feb 5, 2025 • 164 • 3 Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs Paper • 2406.10216 • Published Jun 14, 2024 • 2
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs Paper • 2406.10216 • Published Jun 14, 2024 • 2
GUI-Libra Training GUI agents with augmented reasoning data and a tailored post-training recipe Ray2333/GUI-Libra-3B 4B • Updated Mar 1 • 2 Ray2333/Libra-81K-SFT Updated 25 days ago • 33 Ray2333/Offline_Evaluation Viewer • Updated Feb 22 • 35.2k • 19 Ray2333/Libra-81K Viewer • Updated Feb 20 • 738 • 49
GRM Generalizable Reward Models Ray2333/GRM-llama3-8B-sftreg Text Classification • 8B • Updated Feb 5, 2025 • 10 • 5 Ray2333/GRM-llama3-8B-distill Text Classification • 8B • Updated Feb 5, 2025 • 40 • 6 Ray2333/GRM-Gemma-2B-sftreg Text Classification • 3B • Updated Feb 5, 2025 • 164 • 3 Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs Paper • 2406.10216 • Published Jun 14, 2024 • 2
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs Paper • 2406.10216 • Published Jun 14, 2024 • 2
Ray2333/Qwen2.5-VL-3B-weighted_sft_ratio2-reasoning_and_grounding_changecoord_mixnoreasoning_cpt636 4B • Updated 42 minutes ago
Ray2333/Qwen2.5-VL-7B-weighted_sft_ratio2-reasoning_and_grounding_changecoord_mixnoreasoning_cpt636 8B • Updated 42 minutes ago
Ray2333/GRM-Llama3.2-3B-rewardmodel-ft Text Classification • 3B • Updated Apr 30, 2025 • 1.65k • 13
Ray2333/Gemma-2B-rewardmodel-baseline Text Classification • 3B • Updated Feb 5, 2025 • 24 • 2
Ray2333/reward-model-Mistral-7B-instruct-Unified-Feedback Text Classification • 7B • Updated Feb 5, 2025 • 904 • 11