lironui
/

AeroReformer

Model card Files Files and versions

xet

Community

lironui commited on Aug 26

Commit

96693c2

verified ·

1 Parent(s): 3b2c268

Update README.md

Browse files

Files changed (1) hide show

README.md +183 -13

README.md CHANGED Viewed

@@ -4,27 +4,197 @@ license: mit
 # **AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation**
-🚀 **AeroReformer** is a novel **vision-language framework** for **UAV-based referring image segmentation (UAV-RIS)**, designed to tackle the unique challenges of aerial imagery, including complex spatial scales, occlusions, and diverse object orientations.
-Our approach integrates a **Vision-Language Cross-Attention Module (VLCAM)** for enhanced multimodal understanding and a **Rotation-Aware Multi-Scale Fusion (RAMSF) decoder** to improve segmentation accuracy in aerial scenes.
-The datasets and code will be publicly available at:
-**[GitHub Repository](https://github.com/lironui/AeroReformer)**
 ---
-## **📝 Paper Status**
-The research paper detailing **AeroReformer** has been submitted and will be available soon. Stay tuned for updates!
 ---
-## **📌 Model Overview**
-**AeroReformer** is a **transformer-based vision-language model** designed for **referring segmentation in UAV imagery**. It enables precise **object localization and segmentation** based on **natural language descriptions**, overcoming the limitations of existing RIS models in aerial datasets.
-### **🔹 Key Features**
-✅ **Automatic Labeling Pipeline**: Leverages pre-existing UAV segmentation datasets and Multimodal Large Language Models (MLLMs) to generate textual descriptions.
-✅ **Vision-Language Cross-Attention Module (VLCAM)**: Facilitates efficient cross-modal interaction for improved referring segmentation.
-✅ **Rotation-Aware Multi-Scale Fusion (RAMSF) Decoder**: Enhances segmentation robustness by addressing scale variations and rotational transformations in aerial imagery.
-✅ **New UAV-RIS Benchmark**: Sets a new standard for UAV-based referring segmentation through extensive evaluation of multiple datasets.
 ---

 # **AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation**
+Great—so this Hugging Face repo is **only for hosting pre-trained checkpoints**, and users should go to the GitHub repo for the full implementation. Here’s an updated, ready-to-paste **English model card (README.md)** tailored for a *weights-only* repository, with clean usage snippets via `huggingface_hub` and clear pointers to the codebase.
+---
+---
+tags:
+* image-segmentation
+* referring-image-segmentation
+* vision-language
+* aerial-imagery
+* uav
+* pytorch
+  license: apache-2.0
+  library\_name: pytorch
+  datasets:
+* lironui/UAVid-RIS
+* lironui/VDD-RIS
+  pipeline\_tag: image-segmentation
 ---
+# AeroReformer (Pre-trained Checkpoints)
+**This repository hosts pre-trained checkpoints for AeroReformer.**
+For the **full implementation, training and evaluation scripts**, please see the official codebase:
+**GitHub:** [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)
+> Paper: [AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation](https://arxiv.org/pdf/2502.16680)
+AeroReformer is a vision-language framework for UAV-based referring image segmentation (UAV-RIS), featuring:
+* **VLCAM**: Vision-Language Cross-Attention Module for robust multimodal grounding.
+* **RAMSF**: Rotation-Aware Multi-Scale Fusion decoder for aerial scenes with large scale/rotation variance.
+---
+## 📦 What’s in this repo?
+Pre-trained model weights (PyTorch `.pth`) for:
+* **UAVid-RIS** (best checkpoint)
+* **VDD-RIS** (best checkpoint)
+* (Optional) Swin-Transformer init weights mirrored for convenience
+> Filenames (example – adapt to your actual filenames):
+```
+checkpoints/
+  ├── UAVid_RIS/
+  │   └── model_best_AeroReformer.pth
+  └── VDD_RIS/
+      └── model_best_AeroReformer.pth
+pretrained_weights/
+  └── swin_base_patch4_window12_384_22k.pth   # if you choose to mirror it here
+```
+All training/inference code lives in the GitHub repo.
 ---
+## 🚀 How to use these weights
+### 1) Download with `huggingface_hub`
+```python
+import torch
+from huggingface_hub import hf_hub_download
+# UAVid-RIS best checkpoint
+uavid_ckpt = hf_hub_download(
+    repo_id="<your-org-or-username>/AeroReformer-Checkpoints",
+    filename="checkpoints/UAVid_RIS/model_best_AeroReformer.pth"
+)
+# VDD-RIS best checkpoint
+vdd_ckpt = hf_hub_download(
+    repo_id="<your-org-or-username>/AeroReformer-Checkpoints",
+    filename="checkpoints/VDD_RIS/model_best_AeroReformer.pth"
+)
+# (Optional) Swin init
+swin_init = hf_hub_download(
+    repo_id="<your-org-or-username>/AeroReformer-Checkpoints",
+    filename="pretrained_weights/swin_base_patch4_window12_384_22k.pth"
+)
+uavid_state = torch.load(uavid_ckpt, map_location="cpu")
+vdd_state   = torch.load(vdd_ckpt,  map_location="cpu")
+```
+> Replace `repo_id` with this repo’s actual path on the Hub.
+### 2) Load into the official implementation
+Follow the **loading utilities and model definitions** from the GitHub repo:
+* Code & instructions: [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)
+* Typical usage: run their `test.py` / `train.py` while pointing `--resume` to the checkpoint path you just downloaded.
+Example (UAVid-RIS testing) — **run in the code repo**:
+```bash
+python test.py --swin_type base --dataset uavid_ris \
+  --resume /path/to/model_best_AeroReformer.pth \
+  --model_id AeroReformer --split test --workers 4 --window12 \
+  --img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4
+```
+---
+## ✅ Recommended environments
+* **Python:** 3.10
+* **PyTorch:** 2.3.1
+* **CUDA:** 12.4 (or adapt to your setup)
+* **OS:** Ubuntu (verified by authors)
+> Exact dependency setup and `requirements.txt` are maintained in the GitHub repo.
 ---
+## 📂 Datasets
+Experiments use **UAVid-RIS** and **VDD-RIS**.
+Referring expressions were generated by **Qwen** and **LLaMA** and may include **errors/inconsistencies**.
+* Text labels:
+  * `lironui/UAVid-RIS`
+  * `lironui/VDD-RIS`
+* Raw imagery:
+  * UAVid: [https://uavid.nl/](https://uavid.nl/)
+  * VDD: [https://huggingface.co/datasets/RussRobin/VDD](https://huggingface.co/datasets/RussRobin/VDD)
+Preprocessing and split scripts are provided in the GitHub repo.
+---
+## ⚠️ Notes & Limitations
+* Checkpoints here are **frozen artifacts**; for training/fine-tuning or code changes, use the GitHub repo.
+* Performance depends on preprocessing, image size, and split strategy; follow the original scripts for reproducibility.
+* Auto-generated text expressions may contain **ambiguity/noise**.
+---
+## 🙏 Acknowledgements
+AeroReformer builds upon:
+* [LAVT](https://github.com/yz93/LAVT-RIS)
+* [RMSIN](https://github.com/Lsan2401/RMSIN)
+---
+## 📜 Citation
+```bibtex
+@article{AeroReformer2025,
+  title   = {AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation},
+  author  = {<Author List>},
+  journal = {arXiv preprint arXiv:2502.16680},
+  year    = {2025}
+}
+```
+---
+## 🔐 License
+Unless stated otherwise, weights are distributed under **Apache-2.0**.
+Please also respect the licenses/terms of the underlying datasets and the Swin-Transformer weights.
+---
+### Maintainer tips (optional to keep in the README)
+* Use **Git LFS** for large `.pth` files.
+* Consider adding **SHA256 checksums** to the filenames or this README for integrity.
+* If you later add more variants (e.g., different image sizes or training schedules), list them in a small **Model Index** table:
+| Variant | Dataset   | Img Size | Epochs | File                                                | mIoU (paper) |
+| ------- | --------- | -------: | -----: | --------------------------------------------------- | -----------: |
+| Base    | UAVid-RIS |      480 |     40 | `checkpoints/UAVid_RIS/model_best_AeroReformer.pth` |            — |
+| Base    | VDD-RIS   |      480 |     10 | `checkpoints/VDD_RIS/model_best_AeroReformer.pth`   |            — |
+---
+If you share your **exact filenames** (and optional metrics/hashes), I can slot them into the table and snippets for you.