Update README.md
Browse files
README.md
CHANGED
|
@@ -4,27 +4,197 @@ license: mit
|
|
| 4 |
|
| 5 |
# **AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation**
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
|
|
|
|
|
|
| 10 |
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
---
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
---
|
| 20 |
|
| 21 |
-
##
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
-
|
| 25 |
-
β
**Automatic Labeling Pipeline**: Leverages pre-existing UAV segmentation datasets and Multimodal Large Language Models (MLLMs) to generate textual descriptions.
|
| 26 |
-
β
**Vision-Language Cross-Attention Module (VLCAM)**: Facilitates efficient cross-modal interaction for improved referring segmentation.
|
| 27 |
-
β
**Rotation-Aware Multi-Scale Fusion (RAMSF) Decoder**: Enhances segmentation robustness by addressing scale variations and rotational transformations in aerial imagery.
|
| 28 |
-
β
**New UAV-RIS Benchmark**: Sets a new standard for UAV-based referring segmentation through extensive evaluation of multiple datasets.
|
| 29 |
|
| 30 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
# **AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation**
|
| 6 |
|
| 7 |
+
Greatβso this Hugging Face repo is **only for hosting pre-trained checkpoints**, and users should go to the GitHub repo for the full implementation. Hereβs an updated, ready-to-paste **English model card (README.md)** tailored for a *weights-only* repository, with clean usage snippets via `huggingface_hub` and clear pointers to the codebase.
|
| 8 |
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
---
|
| 12 |
|
| 13 |
+
tags:
|
| 14 |
+
|
| 15 |
+
* image-segmentation
|
| 16 |
+
* referring-image-segmentation
|
| 17 |
+
* vision-language
|
| 18 |
+
* aerial-imagery
|
| 19 |
+
* uav
|
| 20 |
+
* pytorch
|
| 21 |
+
license: apache-2.0
|
| 22 |
+
library\_name: pytorch
|
| 23 |
+
datasets:
|
| 24 |
+
* lironui/UAVid-RIS
|
| 25 |
+
* lironui/VDD-RIS
|
| 26 |
+
pipeline\_tag: image-segmentation
|
| 27 |
|
| 28 |
---
|
| 29 |
|
| 30 |
+
# AeroReformer (Pre-trained Checkpoints)
|
| 31 |
+
|
| 32 |
+
**This repository hosts pre-trained checkpoints for AeroReformer.**
|
| 33 |
+
For the **full implementation, training and evaluation scripts**, please see the official codebase:
|
| 34 |
+
**GitHub:** [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)
|
| 35 |
+
|
| 36 |
+
> Paper: [AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation](https://arxiv.org/pdf/2502.16680)
|
| 37 |
+
|
| 38 |
+
AeroReformer is a vision-language framework for UAV-based referring image segmentation (UAV-RIS), featuring:
|
| 39 |
+
|
| 40 |
+
* **VLCAM**: Vision-Language Cross-Attention Module for robust multimodal grounding.
|
| 41 |
+
* **RAMSF**: Rotation-Aware Multi-Scale Fusion decoder for aerial scenes with large scale/rotation variance.
|
| 42 |
+
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
+
## π¦ Whatβs in this repo?
|
| 46 |
+
|
| 47 |
+
Pre-trained model weights (PyTorch `.pth`) for:
|
| 48 |
+
|
| 49 |
+
* **UAVid-RIS** (best checkpoint)
|
| 50 |
+
* **VDD-RIS** (best checkpoint)
|
| 51 |
+
* (Optional) Swin-Transformer init weights mirrored for convenience
|
| 52 |
+
|
| 53 |
+
> Filenames (example β adapt to your actual filenames):
|
| 54 |
+
|
| 55 |
+
```
|
| 56 |
+
checkpoints/
|
| 57 |
+
βββ UAVid_RIS/
|
| 58 |
+
β βββ model_best_AeroReformer.pth
|
| 59 |
+
βββ VDD_RIS/
|
| 60 |
+
βββ model_best_AeroReformer.pth
|
| 61 |
+
|
| 62 |
+
pretrained_weights/
|
| 63 |
+
βββ swin_base_patch4_window12_384_22k.pth # if you choose to mirror it here
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
All training/inference code lives in the GitHub repo.
|
| 67 |
|
| 68 |
---
|
| 69 |
|
| 70 |
+
## π How to use these weights
|
| 71 |
+
|
| 72 |
+
### 1) Download with `huggingface_hub`
|
| 73 |
+
|
| 74 |
+
```python
|
| 75 |
+
import torch
|
| 76 |
+
from huggingface_hub import hf_hub_download
|
| 77 |
+
|
| 78 |
+
# UAVid-RIS best checkpoint
|
| 79 |
+
uavid_ckpt = hf_hub_download(
|
| 80 |
+
repo_id="<your-org-or-username>/AeroReformer-Checkpoints",
|
| 81 |
+
filename="checkpoints/UAVid_RIS/model_best_AeroReformer.pth"
|
| 82 |
+
)
|
| 83 |
+
|
| 84 |
+
# VDD-RIS best checkpoint
|
| 85 |
+
vdd_ckpt = hf_hub_download(
|
| 86 |
+
repo_id="<your-org-or-username>/AeroReformer-Checkpoints",
|
| 87 |
+
filename="checkpoints/VDD_RIS/model_best_AeroReformer.pth"
|
| 88 |
+
)
|
| 89 |
+
|
| 90 |
+
# (Optional) Swin init
|
| 91 |
+
swin_init = hf_hub_download(
|
| 92 |
+
repo_id="<your-org-or-username>/AeroReformer-Checkpoints",
|
| 93 |
+
filename="pretrained_weights/swin_base_patch4_window12_384_22k.pth"
|
| 94 |
+
)
|
| 95 |
+
|
| 96 |
+
uavid_state = torch.load(uavid_ckpt, map_location="cpu")
|
| 97 |
+
vdd_state = torch.load(vdd_ckpt, map_location="cpu")
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
> Replace `repo_id` with this repoβs actual path on the Hub.
|
| 101 |
+
|
| 102 |
+
### 2) Load into the official implementation
|
| 103 |
+
|
| 104 |
+
Follow the **loading utilities and model definitions** from the GitHub repo:
|
| 105 |
+
|
| 106 |
+
* Code & instructions: [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)
|
| 107 |
+
* Typical usage: run their `test.py` / `train.py` while pointing `--resume` to the checkpoint path you just downloaded.
|
| 108 |
+
|
| 109 |
+
Example (UAVid-RIS testing) β **run in the code repo**:
|
| 110 |
+
|
| 111 |
+
```bash
|
| 112 |
+
python test.py --swin_type base --dataset uavid_ris \
|
| 113 |
+
--resume /path/to/model_best_AeroReformer.pth \
|
| 114 |
+
--model_id AeroReformer --split test --workers 4 --window12 \
|
| 115 |
+
--img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
---
|
| 119 |
+
|
| 120 |
+
## β
Recommended environments
|
| 121 |
+
|
| 122 |
+
* **Python:** 3.10
|
| 123 |
+
* **PyTorch:** 2.3.1
|
| 124 |
+
* **CUDA:** 12.4 (or adapt to your setup)
|
| 125 |
+
* **OS:** Ubuntu (verified by authors)
|
| 126 |
|
| 127 |
+
> Exact dependency setup and `requirements.txt` are maintained in the GitHub repo.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
|
| 129 |
---
|
| 130 |
+
|
| 131 |
+
## π Datasets
|
| 132 |
+
|
| 133 |
+
Experiments use **UAVid-RIS** and **VDD-RIS**.
|
| 134 |
+
Referring expressions were generated by **Qwen** and **LLaMA** and may include **errors/inconsistencies**.
|
| 135 |
+
|
| 136 |
+
* Text labels:
|
| 137 |
+
|
| 138 |
+
* `lironui/UAVid-RIS`
|
| 139 |
+
* `lironui/VDD-RIS`
|
| 140 |
+
* Raw imagery:
|
| 141 |
+
|
| 142 |
+
* UAVid: [https://uavid.nl/](https://uavid.nl/)
|
| 143 |
+
* VDD: [https://huggingface.co/datasets/RussRobin/VDD](https://huggingface.co/datasets/RussRobin/VDD)
|
| 144 |
+
|
| 145 |
+
Preprocessing and split scripts are provided in the GitHub repo.
|
| 146 |
+
|
| 147 |
+
---
|
| 148 |
+
|
| 149 |
+
## β οΈ Notes & Limitations
|
| 150 |
+
|
| 151 |
+
* Checkpoints here are **frozen artifacts**; for training/fine-tuning or code changes, use the GitHub repo.
|
| 152 |
+
* Performance depends on preprocessing, image size, and split strategy; follow the original scripts for reproducibility.
|
| 153 |
+
* Auto-generated text expressions may contain **ambiguity/noise**.
|
| 154 |
+
|
| 155 |
+
---
|
| 156 |
+
|
| 157 |
+
## π Acknowledgements
|
| 158 |
+
|
| 159 |
+
AeroReformer builds upon:
|
| 160 |
+
|
| 161 |
+
* [LAVT](https://github.com/yz93/LAVT-RIS)
|
| 162 |
+
* [RMSIN](https://github.com/Lsan2401/RMSIN)
|
| 163 |
+
|
| 164 |
+
---
|
| 165 |
+
|
| 166 |
+
## π Citation
|
| 167 |
+
|
| 168 |
+
```bibtex
|
| 169 |
+
@article{AeroReformer2025,
|
| 170 |
+
title = {AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation},
|
| 171 |
+
author = {<Author List>},
|
| 172 |
+
journal = {arXiv preprint arXiv:2502.16680},
|
| 173 |
+
year = {2025}
|
| 174 |
+
}
|
| 175 |
+
```
|
| 176 |
+
|
| 177 |
+
---
|
| 178 |
+
|
| 179 |
+
## π License
|
| 180 |
+
|
| 181 |
+
Unless stated otherwise, weights are distributed under **Apache-2.0**.
|
| 182 |
+
Please also respect the licenses/terms of the underlying datasets and the Swin-Transformer weights.
|
| 183 |
+
|
| 184 |
+
---
|
| 185 |
+
|
| 186 |
+
### Maintainer tips (optional to keep in the README)
|
| 187 |
+
|
| 188 |
+
* Use **Git LFS** for large `.pth` files.
|
| 189 |
+
* Consider adding **SHA256 checksums** to the filenames or this README for integrity.
|
| 190 |
+
* If you later add more variants (e.g., different image sizes or training schedules), list them in a small **Model Index** table:
|
| 191 |
+
|
| 192 |
+
| Variant | Dataset | Img Size | Epochs | File | mIoU (paper) |
|
| 193 |
+
| ------- | --------- | -------: | -----: | --------------------------------------------------- | -----------: |
|
| 194 |
+
| Base | UAVid-RIS | 480 | 40 | `checkpoints/UAVid_RIS/model_best_AeroReformer.pth` | β |
|
| 195 |
+
| Base | VDD-RIS | 480 | 10 | `checkpoints/VDD_RIS/model_best_AeroReformer.pth` | β |
|
| 196 |
+
|
| 197 |
+
---
|
| 198 |
+
|
| 199 |
+
If you share your **exact filenames** (and optional metrics/hashes), I can slot them into the table and snippets for you.
|
| 200 |
+
|