Which RL technique is used when training Command A?
Hey @MRU4913
You can find all the details in our tech report! -- https://arxiv.org/abs/2504.00698
@alexrs I mean command A reasoning RL stage
· Sign up or log in to comment