Which RL is used by command- a?

#3
by MRU4913 - opened

Which RL technique is used when training Command A?

Cohere Labs org

Hey @MRU4913

You can find all the details in our tech report! -- https://arxiv.org/abs/2504.00698

@alexrs
I mean command A reasoning RL stage

Sign up or log in to comment