lironui commited on
Commit
96693c2
Β·
verified Β·
1 Parent(s): 3b2c268

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +183 -13
README.md CHANGED
@@ -4,27 +4,197 @@ license: mit
4
 
5
  # **AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation**
6
 
7
- πŸš€ **AeroReformer** is a novel **vision-language framework** for **UAV-based referring image segmentation (UAV-RIS)**, designed to tackle the unique challenges of aerial imagery, including complex spatial scales, occlusions, and diverse object orientations.
8
 
9
- Our approach integrates a **Vision-Language Cross-Attention Module (VLCAM)** for enhanced multimodal understanding and a **Rotation-Aware Multi-Scale Fusion (RAMSF) decoder** to improve segmentation accuracy in aerial scenes.
 
 
10
 
11
- The datasets and code will be publicly available at:
12
- **[GitHub Repository](https://github.com/lironui/AeroReformer)**
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  ---
15
 
16
- ## **πŸ“ Paper Status**
17
- The research paper detailing **AeroReformer** has been submitted and will be available soon. Stay tuned for updates!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ---
20
 
21
- ## **πŸ“Œ Model Overview**
22
- **AeroReformer** is a **transformer-based vision-language model** designed for **referring segmentation in UAV imagery**. It enables precise **object localization and segmentation** based on **natural language descriptions**, overcoming the limitations of existing RIS models in aerial datasets.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
- ### **πŸ”Ή Key Features**
25
- βœ… **Automatic Labeling Pipeline**: Leverages pre-existing UAV segmentation datasets and Multimodal Large Language Models (MLLMs) to generate textual descriptions.
26
- βœ… **Vision-Language Cross-Attention Module (VLCAM)**: Facilitates efficient cross-modal interaction for improved referring segmentation.
27
- βœ… **Rotation-Aware Multi-Scale Fusion (RAMSF) Decoder**: Enhances segmentation robustness by addressing scale variations and rotational transformations in aerial imagery.
28
- βœ… **New UAV-RIS Benchmark**: Sets a new standard for UAV-based referring segmentation through extensive evaluation of multiple datasets.
29
 
30
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  # **AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation**
6
 
7
+ Greatβ€”so this Hugging Face repo is **only for hosting pre-trained checkpoints**, and users should go to the GitHub repo for the full implementation. Here’s an updated, ready-to-paste **English model card (README.md)** tailored for a *weights-only* repository, with clean usage snippets via `huggingface_hub` and clear pointers to the codebase.
8
 
9
+ ---
10
+
11
+ ---
12
 
13
+ tags:
14
+
15
+ * image-segmentation
16
+ * referring-image-segmentation
17
+ * vision-language
18
+ * aerial-imagery
19
+ * uav
20
+ * pytorch
21
+ license: apache-2.0
22
+ library\_name: pytorch
23
+ datasets:
24
+ * lironui/UAVid-RIS
25
+ * lironui/VDD-RIS
26
+ pipeline\_tag: image-segmentation
27
 
28
  ---
29
 
30
+ # AeroReformer (Pre-trained Checkpoints)
31
+
32
+ **This repository hosts pre-trained checkpoints for AeroReformer.**
33
+ For the **full implementation, training and evaluation scripts**, please see the official codebase:
34
+ **GitHub:** [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)
35
+
36
+ > Paper: [AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation](https://arxiv.org/pdf/2502.16680)
37
+
38
+ AeroReformer is a vision-language framework for UAV-based referring image segmentation (UAV-RIS), featuring:
39
+
40
+ * **VLCAM**: Vision-Language Cross-Attention Module for robust multimodal grounding.
41
+ * **RAMSF**: Rotation-Aware Multi-Scale Fusion decoder for aerial scenes with large scale/rotation variance.
42
+
43
+ ---
44
+
45
+ ## πŸ“¦ What’s in this repo?
46
+
47
+ Pre-trained model weights (PyTorch `.pth`) for:
48
+
49
+ * **UAVid-RIS** (best checkpoint)
50
+ * **VDD-RIS** (best checkpoint)
51
+ * (Optional) Swin-Transformer init weights mirrored for convenience
52
+
53
+ > Filenames (example – adapt to your actual filenames):
54
+
55
+ ```
56
+ checkpoints/
57
+ β”œβ”€β”€ UAVid_RIS/
58
+ β”‚ └── model_best_AeroReformer.pth
59
+ └── VDD_RIS/
60
+ └── model_best_AeroReformer.pth
61
+
62
+ pretrained_weights/
63
+ └── swin_base_patch4_window12_384_22k.pth # if you choose to mirror it here
64
+ ```
65
+
66
+ All training/inference code lives in the GitHub repo.
67
 
68
  ---
69
 
70
+ ## πŸš€ How to use these weights
71
+
72
+ ### 1) Download with `huggingface_hub`
73
+
74
+ ```python
75
+ import torch
76
+ from huggingface_hub import hf_hub_download
77
+
78
+ # UAVid-RIS best checkpoint
79
+ uavid_ckpt = hf_hub_download(
80
+ repo_id="<your-org-or-username>/AeroReformer-Checkpoints",
81
+ filename="checkpoints/UAVid_RIS/model_best_AeroReformer.pth"
82
+ )
83
+
84
+ # VDD-RIS best checkpoint
85
+ vdd_ckpt = hf_hub_download(
86
+ repo_id="<your-org-or-username>/AeroReformer-Checkpoints",
87
+ filename="checkpoints/VDD_RIS/model_best_AeroReformer.pth"
88
+ )
89
+
90
+ # (Optional) Swin init
91
+ swin_init = hf_hub_download(
92
+ repo_id="<your-org-or-username>/AeroReformer-Checkpoints",
93
+ filename="pretrained_weights/swin_base_patch4_window12_384_22k.pth"
94
+ )
95
+
96
+ uavid_state = torch.load(uavid_ckpt, map_location="cpu")
97
+ vdd_state = torch.load(vdd_ckpt, map_location="cpu")
98
+ ```
99
+
100
+ > Replace `repo_id` with this repo’s actual path on the Hub.
101
+
102
+ ### 2) Load into the official implementation
103
+
104
+ Follow the **loading utilities and model definitions** from the GitHub repo:
105
+
106
+ * Code & instructions: [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)
107
+ * Typical usage: run their `test.py` / `train.py` while pointing `--resume` to the checkpoint path you just downloaded.
108
+
109
+ Example (UAVid-RIS testing) β€” **run in the code repo**:
110
+
111
+ ```bash
112
+ python test.py --swin_type base --dataset uavid_ris \
113
+ --resume /path/to/model_best_AeroReformer.pth \
114
+ --model_id AeroReformer --split test --workers 4 --window12 \
115
+ --img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4
116
+ ```
117
+
118
+ ---
119
+
120
+ ## βœ… Recommended environments
121
+
122
+ * **Python:** 3.10
123
+ * **PyTorch:** 2.3.1
124
+ * **CUDA:** 12.4 (or adapt to your setup)
125
+ * **OS:** Ubuntu (verified by authors)
126
 
127
+ > Exact dependency setup and `requirements.txt` are maintained in the GitHub repo.
 
 
 
 
128
 
129
  ---
130
+
131
+ ## πŸ“‚ Datasets
132
+
133
+ Experiments use **UAVid-RIS** and **VDD-RIS**.
134
+ Referring expressions were generated by **Qwen** and **LLaMA** and may include **errors/inconsistencies**.
135
+
136
+ * Text labels:
137
+
138
+ * `lironui/UAVid-RIS`
139
+ * `lironui/VDD-RIS`
140
+ * Raw imagery:
141
+
142
+ * UAVid: [https://uavid.nl/](https://uavid.nl/)
143
+ * VDD: [https://huggingface.co/datasets/RussRobin/VDD](https://huggingface.co/datasets/RussRobin/VDD)
144
+
145
+ Preprocessing and split scripts are provided in the GitHub repo.
146
+
147
+ ---
148
+
149
+ ## ⚠️ Notes & Limitations
150
+
151
+ * Checkpoints here are **frozen artifacts**; for training/fine-tuning or code changes, use the GitHub repo.
152
+ * Performance depends on preprocessing, image size, and split strategy; follow the original scripts for reproducibility.
153
+ * Auto-generated text expressions may contain **ambiguity/noise**.
154
+
155
+ ---
156
+
157
+ ## πŸ™ Acknowledgements
158
+
159
+ AeroReformer builds upon:
160
+
161
+ * [LAVT](https://github.com/yz93/LAVT-RIS)
162
+ * [RMSIN](https://github.com/Lsan2401/RMSIN)
163
+
164
+ ---
165
+
166
+ ## πŸ“œ Citation
167
+
168
+ ```bibtex
169
+ @article{AeroReformer2025,
170
+ title = {AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation},
171
+ author = {<Author List>},
172
+ journal = {arXiv preprint arXiv:2502.16680},
173
+ year = {2025}
174
+ }
175
+ ```
176
+
177
+ ---
178
+
179
+ ## πŸ” License
180
+
181
+ Unless stated otherwise, weights are distributed under **Apache-2.0**.
182
+ Please also respect the licenses/terms of the underlying datasets and the Swin-Transformer weights.
183
+
184
+ ---
185
+
186
+ ### Maintainer tips (optional to keep in the README)
187
+
188
+ * Use **Git LFS** for large `.pth` files.
189
+ * Consider adding **SHA256 checksums** to the filenames or this README for integrity.
190
+ * If you later add more variants (e.g., different image sizes or training schedules), list them in a small **Model Index** table:
191
+
192
+ | Variant | Dataset | Img Size | Epochs | File | mIoU (paper) |
193
+ | ------- | --------- | -------: | -----: | --------------------------------------------------- | -----------: |
194
+ | Base | UAVid-RIS | 480 | 40 | `checkpoints/UAVid_RIS/model_best_AeroReformer.pth` | β€” |
195
+ | Base | VDD-RIS | 480 | 10 | `checkpoints/VDD_RIS/model_best_AeroReformer.pth` | β€” |
196
+
197
+ ---
198
+
199
+ If you share your **exact filenames** (and optional metrics/hashes), I can slot them into the table and snippets for you.
200
+