Will data be open sourced?

#15
by Sourajit123 - opened

Can you be legendary and also opensource the data + training code used?

WeiboAI org

We are considering releasing more training details, data recipes, and RL code in the future, but we don’t have a fixed timeline yet.

What we can say for now is that the data sources we used are open-source / public datasets, with filtering, synthesis, verification, and decontamination on top. Before releasing anything, we still need to clean up the pipeline, check licenses, and make sure the released version is actually useful and reproducible.

Thanks a lot! That would be super helpful.

We are considering releasing more training details, data recipes, and RL code in the future, but we don’t have a fixed timeline yet.

@lsx666 Want to check in are there any process on releasing the training details and data recipes?

Sign up or log in to comment