lironui
/

AeroReformer

Model card Files Files and versions

AeroReformer / README.md

lironui's picture

Update README.md

e607b4c verified 3 months ago

|

history blame contribute delete

3.52 kB

	---
	license: mit
	---

	# AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation

	This repository hosts pre-trained checkpoints for AeroReformer.
	For the full implementation, training and evaluation scripts, please see the official codebase:

	GitHub: [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)

	> Paper: [AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation](https://arxiv.org/pdf/2502.16680)

	AeroReformer is a vision-language framework for UAV-based referring image segmentation (UAV-RIS), featuring:

	* VLCAM: Vision-Language Cross-Attention Module for robust multimodal grounding.
	* RAMSF: Rotation-Aware Multi-Scale Fusion decoder for aerial scenes with large scale/rotation variance.

	---

	## 📦 What’s in this repo?

	Pre-trained model weights (PyTorch `.pth`) for:

	* UAVid-RIS (best checkpoint)
	* VDD-RIS (best checkpoint)

	All training/inference code lives in the GitHub repo.

	---

	## 🚀 How to use these weights

	### 1) Download with `huggingface_hub`


	### 2) Load into the official implementation

	Follow the loading utilities and model definitions from the GitHub repo:

	* Code & instructions: [https://github.com/lironui/AeroReformer](https://github.com/lironui/AeroReformer)
	* Typical usage: run their `test.py` / `train.py` while pointing `--resume` to the checkpoint path you just downloaded.

	Example (UAVid-RIS testing) — run in the code repo:

	```bash
	python test.py --swin_type base --dataset uavid_ris \
	--resume /path/to/model_best_AeroReformer.pth \
	--model_id AeroReformer --split test --workers 4 --window12 \
	--img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4
	```

	---

	## ✅ Recommended environments

	* Python: 3.10
	* PyTorch: 2.3.1
	* CUDA: 12.4 (or adapt to your setup)
	* OS: Ubuntu (verified by authors)

	> Exact dependency setup and `requirements.txt` are maintained in the GitHub repo.

	---

	## 📂 Datasets

	Experiments use UAVid-RIS and VDD-RIS.
	Referring expressions were generated by Qwen and LLaMA and may include errors/inconsistencies.

	* Text labels:

	* [lironui/UAVid-RIS](https://huggingface.co/datasets/lironui/UAVid-RIS)
	* [lironui/VDD-RIS](https://huggingface.co/datasets/lironui/VDD-RIS)
	* Raw imagery:

	* UAVid: [https://uavid.nl/](https://uavid.nl/)
	* VDD: [https://huggingface.co/datasets/RussRobin/VDD](https://huggingface.co/datasets/RussRobin/VDD)

	Preprocessing and split scripts are provided in the GitHub repo.

	---

	## ⚠️ Notes & Limitations

	* Checkpoints here are frozen artifacts; for training/fine-tuning or code changes, use the GitHub repo.
	* Performance depends on preprocessing, image size, and split strategy; follow the original scripts for reproducibility.
	* Auto-generated text expressions may contain ambiguity/noise.

	---

	## 🙏 Acknowledgements

	AeroReformer builds upon:

	* [LAVT](https://github.com/yz93/LAVT-RIS)
	* [RMSIN](https://github.com/Lsan2401/RMSIN)

	---

	## 📜 Citation

	```bibtex
	@article{AeroReformer2025,
	title = {AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation},
	author = {<Author List>},
	journal = {arXiv preprint arXiv:2502.16680},
	year = {2025}
	}
	```

	---

	## 🔐 License

	Unless stated otherwise, weights are distributed under MIT.
	Please also respect the licenses/terms of the underlying datasets and the Swin-Transformer weights.

	---