AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation

This repository hosts pre-trained checkpoints for AeroReformer. For the full implementation, training and evaluation scripts, please see the official codebase:

GitHub: https://github.com/lironui/AeroReformer

Paper: AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation

AeroReformer is a vision-language framework for UAV-based referring image segmentation (UAV-RIS), featuring:

VLCAM: Vision-Language Cross-Attention Module for robust multimodal grounding.
RAMSF: Rotation-Aware Multi-Scale Fusion decoder for aerial scenes with large scale/rotation variance.

📦 What’s in this repo?

Pre-trained model weights (PyTorch .pth) for:

UAVid-RIS (best checkpoint)
VDD-RIS (best checkpoint)

All training/inference code lives in the GitHub repo.

🚀 How to use these weights

1) Download with `huggingface_hub`

2) Load into the official implementation

Follow the loading utilities and model definitions from the GitHub repo:

Code & instructions: https://github.com/lironui/AeroReformer
Typical usage: run their test.py / train.py while pointing --resume to the checkpoint path you just downloaded.

Example (UAVid-RIS testing) — run in the code repo:

python test.py --swin_type base --dataset uavid_ris \
  --resume /path/to/model_best_AeroReformer.pth \
  --model_id AeroReformer --split test --workers 4 --window12 \
  --img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4

✅ Recommended environments

Python: 3.10
PyTorch: 2.3.1
CUDA: 12.4 (or adapt to your setup)
OS: Ubuntu (verified by authors)

Exact dependency setup and requirements.txt are maintained in the GitHub repo.

📂 Datasets

Experiments use UAVid-RIS and VDD-RIS. Referring expressions were generated by Qwen and LLaMA and may include errors/inconsistencies.

Text labels:
- lironui/UAVid-RIS
- lironui/VDD-RIS
Raw imagery:
- UAVid: https://uavid.nl/
- VDD: https://huggingface.co/datasets/RussRobin/VDD

Preprocessing and split scripts are provided in the GitHub repo.

⚠️ Notes & Limitations

Checkpoints here are frozen artifacts; for training/fine-tuning or code changes, use the GitHub repo.
Performance depends on preprocessing, image size, and split strategy; follow the original scripts for reproducibility.
Auto-generated text expressions may contain ambiguity/noise.

🙏 Acknowledgements

AeroReformer builds upon:

LAVT
RMSIN

📜 Citation

@article{AeroReformer2025,
  title   = {AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation},
  author  = {<Author List>},
  journal = {arXiv preprint arXiv:2502.16680},
  year    = {2025}
}

🔐 License

Unless stated otherwise, weights are distributed under MIT. Please also respect the licenses/terms of the underlying datasets and the Swin-Transformer weights.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support