|
|
--- |
|
|
license: cc-by-nc-sa-4.0 |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
size_categories: |
|
|
- n>1T |
|
|
tags: |
|
|
- robotics |
|
|
- real-world |
|
|
- dual-arm |
|
|
- whole body control |
|
|
- manipulation |
|
|
datasets: |
|
|
- OpenGalaxea/Galaxea-Open-World-Dataset |
|
|
--- |
|
|
|
|
|
# 🚀 Galaxea Open-World Dataset and G0 Dual-System VLA Model |
|
|
[](https://opengalaxea.github.io/G0/) |
|
|
[](https://arxiv.org/abs/2509.00576) |
|
|
[](https://opengalaxea.github.io/G0/) |
|
|
[](https://opengalaxea.github.io/G0/visualizer/index.html) |
|
|
[](https://www.modelscope.cn/organization/Galaxea) |
|
|
[](https://x.com/Galaxea_x) |
|
|
[](https://www.linkedin.com/company/galaxeadynamics/posts/?feedView=all&viewAsMember=true) |
|
|
|
|
|
|
|
|
G0-VLA architecture and training pipeline: Stage 1 pre-trains a vision-language model on cross-embodiment data in an autoregressive manner. Stage 2 and post-train share the same model structure, trained on Galaxea open-world data with embodiment-specific views and high-level and subtask instructions, by supervising the Action Transformer’s action reconstruction with a flow- matching loss. |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
In this repo, you can find: |
|
|
- [x] G0_3B_base.pt: **Model Weights after Stage2 Pretraining** |
|
|
- [x] G0_3B_base_dataset_statistics: **Statistics for Dataset Used in Pretraining** |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 📜 Citation |
|
|
|
|
|
All the data and code within this repo are under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). If you use our dataset or models, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{galaxea2025, |
|
|
title={Galaxea G0: Open-World Dataset and Dual-System VLA Model}, |
|
|
author={Galaxea Team}, |
|
|
journal={arXiv preprint arXiv:2509.00576}, |
|
|
year={2025} |
|
|
} |