|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# **AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation** |
|
|
|
|
|
**This repository hosts pre-trained checkpoints for AeroReformer.** |
|
|
For the **full implementation, training and evaluation scripts**, please see the official codebase: |
|
|
|
|
|
**GitHub:** [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer) |
|
|
|
|
|
> Paper: [AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation](https://arxiv.org/pdf/2502.16680) |
|
|
|
|
|
AeroReformer is a vision-language framework for UAV-based referring image segmentation (UAV-RIS), featuring: |
|
|
|
|
|
* **VLCAM**: Vision-Language Cross-Attention Module for robust multimodal grounding. |
|
|
* **RAMSF**: Rotation-Aware Multi-Scale Fusion decoder for aerial scenes with large scale/rotation variance. |
|
|
|
|
|
--- |
|
|
|
|
|
## π¦ Whatβs in this repo? |
|
|
|
|
|
Pre-trained model weights (PyTorch `.pth`) for: |
|
|
|
|
|
* **UAVid-RIS** (best checkpoint) |
|
|
* **VDD-RIS** (best checkpoint) |
|
|
|
|
|
All training/inference code lives in the GitHub repo. |
|
|
|
|
|
--- |
|
|
|
|
|
## π How to use these weights |
|
|
|
|
|
### 1) Download with `huggingface_hub` |
|
|
|
|
|
|
|
|
### 2) Load into the official implementation |
|
|
|
|
|
Follow the **loading utilities and model definitions** from the GitHub repo: |
|
|
|
|
|
* Code & instructions: [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer) |
|
|
* Typical usage: run their `test.py` / `train.py` while pointing `--resume` to the checkpoint path you just downloaded. |
|
|
|
|
|
Example (UAVid-RIS testing) β **run in the code repo**: |
|
|
|
|
|
```bash |
|
|
python test.py --swin_type base --dataset uavid_ris \ |
|
|
--resume /path/to/model_best_AeroReformer.pth \ |
|
|
--model_id AeroReformer --split test --workers 4 --window12 \ |
|
|
--img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4 |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## β
Recommended environments |
|
|
|
|
|
* **Python:** 3.10 |
|
|
* **PyTorch:** 2.3.1 |
|
|
* **CUDA:** 12.4 (or adapt to your setup) |
|
|
* **OS:** Ubuntu (verified by authors) |
|
|
|
|
|
> Exact dependency setup and `requirements.txt` are maintained in the GitHub repo. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Datasets |
|
|
|
|
|
Experiments use **UAVid-RIS** and **VDD-RIS**. |
|
|
Referring expressions were generated by **Qwen** and **LLaMA** and may include **errors/inconsistencies**. |
|
|
|
|
|
* Text labels: |
|
|
|
|
|
* [lironui/UAVid-RIS](https://huggingface.co/datasets/lironui/UAVid-RIS) |
|
|
* [lironui/VDD-RIS](https://huggingface.co/datasets/lironui/VDD-RIS) |
|
|
* Raw imagery: |
|
|
|
|
|
* UAVid: [https://uavid.nl/](https://uavid.nl/) |
|
|
* VDD: [https://huggingface.co/datasets/RussRobin/VDD](https://huggingface.co/datasets/RussRobin/VDD) |
|
|
|
|
|
Preprocessing and split scripts are provided in the GitHub repo. |
|
|
|
|
|
--- |
|
|
|
|
|
## β οΈ Notes & Limitations |
|
|
|
|
|
* Checkpoints here are **frozen artifacts**; for training/fine-tuning or code changes, use the GitHub repo. |
|
|
* Performance depends on preprocessing, image size, and split strategy; follow the original scripts for reproducibility. |
|
|
* Auto-generated text expressions may contain **ambiguity/noise**. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Acknowledgements |
|
|
|
|
|
AeroReformer builds upon: |
|
|
|
|
|
* [LAVT](https://github.com/yz93/LAVT-RIS) |
|
|
* [RMSIN](https://github.com/Lsan2401/RMSIN) |
|
|
|
|
|
--- |
|
|
|
|
|
## π Citation |
|
|
|
|
|
```bibtex |
|
|
@article{AeroReformer2025, |
|
|
title = {AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation}, |
|
|
author = {<Author List>}, |
|
|
journal = {arXiv preprint arXiv:2502.16680}, |
|
|
year = {2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
|
|
|
Unless stated otherwise, weights are distributed under **MIT**. |
|
|
Please also respect the licenses/terms of the underlying datasets and the Swin-Transformer weights. |
|
|
|
|
|
--- |
|
|
|
|
|
|