File size: 3,519 Bytes
52b7130 3b2c268 52b7130 96693c2 e607b4c 96693c2 d979df1 96693c2 52b7130 96693c2 52b7130 d979df1 96693c2 154b81d 96693c2 3b395f4 96693c2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
---
license: mit
---
# **AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation**
**This repository hosts pre-trained checkpoints for AeroReformer.**
For the **full implementation, training and evaluation scripts**, please see the official codebase:
**GitHub:** [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)
> Paper: [AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation](https://arxiv.org/pdf/2502.16680)
AeroReformer is a vision-language framework for UAV-based referring image segmentation (UAV-RIS), featuring:
* **VLCAM**: Vision-Language Cross-Attention Module for robust multimodal grounding.
* **RAMSF**: Rotation-Aware Multi-Scale Fusion decoder for aerial scenes with large scale/rotation variance.
---
## π¦ Whatβs in this repo?
Pre-trained model weights (PyTorch `.pth`) for:
* **UAVid-RIS** (best checkpoint)
* **VDD-RIS** (best checkpoint)
All training/inference code lives in the GitHub repo.
---
## π How to use these weights
### 1) Download with `huggingface_hub`
### 2) Load into the official implementation
Follow the **loading utilities and model definitions** from the GitHub repo:
* Code & instructions: [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)
* Typical usage: run their `test.py` / `train.py` while pointing `--resume` to the checkpoint path you just downloaded.
Example (UAVid-RIS testing) β **run in the code repo**:
```bash
python test.py --swin_type base --dataset uavid_ris \
--resume /path/to/model_best_AeroReformer.pth \
--model_id AeroReformer --split test --workers 4 --window12 \
--img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4
```
---
## β
Recommended environments
* **Python:** 3.10
* **PyTorch:** 2.3.1
* **CUDA:** 12.4 (or adapt to your setup)
* **OS:** Ubuntu (verified by authors)
> Exact dependency setup and `requirements.txt` are maintained in the GitHub repo.
---
## π Datasets
Experiments use **UAVid-RIS** and **VDD-RIS**.
Referring expressions were generated by **Qwen** and **LLaMA** and may include **errors/inconsistencies**.
* Text labels:
* [lironui/UAVid-RIS](https://huggingface.co/datasets/lironui/UAVid-RIS)
* [lironui/VDD-RIS](https://huggingface.co/datasets/lironui/VDD-RIS)
* Raw imagery:
* UAVid: [https://uavid.nl/](https://uavid.nl/)
* VDD: [https://huggingface.co/datasets/RussRobin/VDD](https://huggingface.co/datasets/RussRobin/VDD)
Preprocessing and split scripts are provided in the GitHub repo.
---
## β οΈ Notes & Limitations
* Checkpoints here are **frozen artifacts**; for training/fine-tuning or code changes, use the GitHub repo.
* Performance depends on preprocessing, image size, and split strategy; follow the original scripts for reproducibility.
* Auto-generated text expressions may contain **ambiguity/noise**.
---
## π Acknowledgements
AeroReformer builds upon:
* [LAVT](https://github.com/yz93/LAVT-RIS)
* [RMSIN](https://github.com/Lsan2401/RMSIN)
---
## π Citation
```bibtex
@article{AeroReformer2025,
title = {AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation},
author = {<Author List>},
journal = {arXiv preprint arXiv:2502.16680},
year = {2025}
}
```
---
## π License
Unless stated otherwise, weights are distributed under **MIT**.
Please also respect the licenses/terms of the underlying datasets and the Swin-Transformer weights.
---
|