view article Article The Engineering Handbook for GRPO + LoRA with Verl: Training Qwen2.5 on Multi-GPU Jan 2 • 13
Optimizing Diversity and Quality through Base-Aligned Model Collaboration Paper • 2511.05650 • Published Nov 7, 2025 • 6
meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text • 11B • Updated Dec 4, 2024 • 181k • • 1.56k