AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation
This repository hosts pre-trained checkpoints for AeroReformer. For the full implementation, training and evaluation scripts, please see the official codebase:
GitHub: https://github.com/lironui/AeroReformer
Paper: AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation
AeroReformer is a vision-language framework for UAV-based referring image segmentation (UAV-RIS), featuring:
- VLCAM: Vision-Language Cross-Attention Module for robust multimodal grounding.
- RAMSF: Rotation-Aware Multi-Scale Fusion decoder for aerial scenes with large scale/rotation variance.
π¦ Whatβs in this repo?
Pre-trained model weights (PyTorch .pth) for:
- UAVid-RIS (best checkpoint)
- VDD-RIS (best checkpoint)
All training/inference code lives in the GitHub repo.
π How to use these weights
1) Download with huggingface_hub
2) Load into the official implementation
Follow the loading utilities and model definitions from the GitHub repo:
- Code & instructions: https://github.com/lironui/AeroReformer
- Typical usage: run their
test.py/train.pywhile pointing--resumeto the checkpoint path you just downloaded.
Example (UAVid-RIS testing) β run in the code repo:
python test.py --swin_type base --dataset uavid_ris \
--resume /path/to/model_best_AeroReformer.pth \
--model_id AeroReformer --split test --workers 4 --window12 \
--img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4
β Recommended environments
- Python: 3.10
- PyTorch: 2.3.1
- CUDA: 12.4 (or adapt to your setup)
- OS: Ubuntu (verified by authors)
Exact dependency setup and
requirements.txtare maintained in the GitHub repo.
π Datasets
Experiments use UAVid-RIS and VDD-RIS. Referring expressions were generated by Qwen and LLaMA and may include errors/inconsistencies.
Text labels:
Raw imagery:
Preprocessing and split scripts are provided in the GitHub repo.
β οΈ Notes & Limitations
- Checkpoints here are frozen artifacts; for training/fine-tuning or code changes, use the GitHub repo.
- Performance depends on preprocessing, image size, and split strategy; follow the original scripts for reproducibility.
- Auto-generated text expressions may contain ambiguity/noise.
π Acknowledgements
AeroReformer builds upon:
π Citation
@article{AeroReformer2025,
title = {AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation},
author = {<Author List>},
journal = {arXiv preprint arXiv:2502.16680},
year = {2025}
}
π License
Unless stated otherwise, weights are distributed under MIT. Please also respect the licenses/terms of the underlying datasets and the Swin-Transformer weights.