--- license: mit --- # **AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation** **This repository hosts pre-trained checkpoints for AeroReformer.** For the **full implementation, training and evaluation scripts**, please see the official codebase: **GitHub:** [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer) > Paper: [AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation](https://arxiv.org/pdf/2502.16680) AeroReformer is a vision-language framework for UAV-based referring image segmentation (UAV-RIS), featuring: * **VLCAM**: Vision-Language Cross-Attention Module for robust multimodal grounding. * **RAMSF**: Rotation-Aware Multi-Scale Fusion decoder for aerial scenes with large scale/rotation variance. --- ## πŸ“¦ What’s in this repo? Pre-trained model weights (PyTorch `.pth`) for: * **UAVid-RIS** (best checkpoint) * **VDD-RIS** (best checkpoint) All training/inference code lives in the GitHub repo. --- ## πŸš€ How to use these weights ### 1) Download with `huggingface_hub` ### 2) Load into the official implementation Follow the **loading utilities and model definitions** from the GitHub repo: * Code & instructions: [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer) * Typical usage: run their `test.py` / `train.py` while pointing `--resume` to the checkpoint path you just downloaded. Example (UAVid-RIS testing) β€” **run in the code repo**: ```bash python test.py --swin_type base --dataset uavid_ris \ --resume /path/to/model_best_AeroReformer.pth \ --model_id AeroReformer --split test --workers 4 --window12 \ --img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4 ``` --- ## βœ… Recommended environments * **Python:** 3.10 * **PyTorch:** 2.3.1 * **CUDA:** 12.4 (or adapt to your setup) * **OS:** Ubuntu (verified by authors) > Exact dependency setup and `requirements.txt` are maintained in the GitHub repo. --- ## πŸ“‚ Datasets Experiments use **UAVid-RIS** and **VDD-RIS**. Referring expressions were generated by **Qwen** and **LLaMA** and may include **errors/inconsistencies**. * Text labels: * [lironui/UAVid-RIS](https://huggingface.co/datasets/lironui/UAVid-RIS) * [lironui/VDD-RIS](https://huggingface.co/datasets/lironui/VDD-RIS) * Raw imagery: * UAVid: [https://uavid.nl/](https://uavid.nl/) * VDD: [https://huggingface.co/datasets/RussRobin/VDD](https://huggingface.co/datasets/RussRobin/VDD) Preprocessing and split scripts are provided in the GitHub repo. --- ## ⚠️ Notes & Limitations * Checkpoints here are **frozen artifacts**; for training/fine-tuning or code changes, use the GitHub repo. * Performance depends on preprocessing, image size, and split strategy; follow the original scripts for reproducibility. * Auto-generated text expressions may contain **ambiguity/noise**. --- ## πŸ™ Acknowledgements AeroReformer builds upon: * [LAVT](https://github.com/yz93/LAVT-RIS) * [RMSIN](https://github.com/Lsan2401/RMSIN) --- ## πŸ“œ Citation ```bibtex @article{AeroReformer2025, title = {AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation}, author = {}, journal = {arXiv preprint arXiv:2502.16680}, year = {2025} } ``` --- ## πŸ” License Unless stated otherwise, weights are distributed under **MIT**. Please also respect the licenses/terms of the underlying datasets and the Swin-Transformer weights. ---