From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones Paper • 2509.25123 • Published Sep 29, 2025 • 20
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 190
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2, 2025 • 124
hanbin/Llama-3.1-8B-pretrain-1-pes2o-anneal-1B_oasst1_wildchat Text Generation • 8B • Updated Jul 29, 2025
hanbin/Llama-3.1-8B-pes2o-anneal-2.7B_oasst1_wildchat Text Generation • 8B • Updated Jul 29, 2025 • 1
hanbin/Llama-3.1-8B-pretrain-1-pes2o-anneal-1B_oasst1_wildchat Text Generation • 8B • Updated Jul 29, 2025
hanbin/Llama-3.1-8B-pes2o-anneal-2.7B_oasst1_wildchat Text Generation • 8B • Updated Jul 29, 2025 • 1