File size: 3,519 Bytes
52b7130
 
 
 
3b2c268
52b7130
96693c2
 
e607b4c
96693c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d979df1
 
 
96693c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52b7130
96693c2
52b7130
d979df1
96693c2
 
 
 
 
 
 
 
154b81d
 
96693c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b395f4
96693c2
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
license: mit
---

# **AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation**

**This repository hosts pre-trained checkpoints for AeroReformer.**
For the **full implementation, training and evaluation scripts**, please see the official codebase:

**GitHub:** [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)

> Paper: [AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation](https://arxiv.org/pdf/2502.16680)

AeroReformer is a vision-language framework for UAV-based referring image segmentation (UAV-RIS), featuring:

* **VLCAM**: Vision-Language Cross-Attention Module for robust multimodal grounding.
* **RAMSF**: Rotation-Aware Multi-Scale Fusion decoder for aerial scenes with large scale/rotation variance.

---

## πŸ“¦ What’s in this repo?

Pre-trained model weights (PyTorch `.pth`) for:

* **UAVid-RIS** (best checkpoint)
* **VDD-RIS** (best checkpoint)

All training/inference code lives in the GitHub repo.

---

## πŸš€ How to use these weights

### 1) Download with `huggingface_hub`


### 2) Load into the official implementation

Follow the **loading utilities and model definitions** from the GitHub repo:

* Code & instructions: [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)
* Typical usage: run their `test.py` / `train.py` while pointing `--resume` to the checkpoint path you just downloaded.

Example (UAVid-RIS testing) β€” **run in the code repo**:

```bash
python test.py --swin_type base --dataset uavid_ris \
  --resume /path/to/model_best_AeroReformer.pth \
  --model_id AeroReformer --split test --workers 4 --window12 \
  --img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4
```

---

## βœ… Recommended environments

* **Python:** 3.10
* **PyTorch:** 2.3.1
* **CUDA:** 12.4 (or adapt to your setup)
* **OS:** Ubuntu (verified by authors)

> Exact dependency setup and `requirements.txt` are maintained in the GitHub repo.

---

## πŸ“‚ Datasets

Experiments use **UAVid-RIS** and **VDD-RIS**.
Referring expressions were generated by **Qwen** and **LLaMA** and may include **errors/inconsistencies**.

* Text labels:

  * [lironui/UAVid-RIS](https://huggingface.co/datasets/lironui/UAVid-RIS)
  * [lironui/VDD-RIS](https://huggingface.co/datasets/lironui/VDD-RIS)
* Raw imagery:

  * UAVid: [https://uavid.nl/](https://uavid.nl/)
  * VDD: [https://huggingface.co/datasets/RussRobin/VDD](https://huggingface.co/datasets/RussRobin/VDD)

Preprocessing and split scripts are provided in the GitHub repo.

---

## ⚠️ Notes & Limitations

* Checkpoints here are **frozen artifacts**; for training/fine-tuning or code changes, use the GitHub repo.
* Performance depends on preprocessing, image size, and split strategy; follow the original scripts for reproducibility.
* Auto-generated text expressions may contain **ambiguity/noise**.

---

## πŸ™ Acknowledgements

AeroReformer builds upon:

* [LAVT](https://github.com/yz93/LAVT-RIS)
* [RMSIN](https://github.com/Lsan2401/RMSIN)

---

## πŸ“œ Citation

```bibtex
@article{AeroReformer2025,
  title   = {AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation},
  author  = {<Author List>},
  journal = {arXiv preprint arXiv:2502.16680},
  year    = {2025}
}
```

---

## πŸ” License

Unless stated otherwise, weights are distributed under **MIT**.
Please also respect the licenses/terms of the underlying datasets and the Swin-Transformer weights.

---