arxiv:2510.11370
Wenhan Ma
CuteNPC
AI & ML interests
Large Language Model
Recent Activity
upvoted
a
paper
1 day ago
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
liked
a model
16 days ago
Lansechen/deepseek-v2-lite-16b-chat-R1-Distill-bs17k-batch32
authored
a paper
about 1 month ago
Stabilizing MoE Reinforcement Learning by Aligning Training and
Inference Routers
Organizations
None yet