AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation

This repository hosts pre-trained checkpoints for AeroReformer. For the full implementation, training and evaluation scripts, please see the official codebase:

GitHub: https://github.com/lironui/AeroReformer

Paper: AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation

AeroReformer is a vision-language framework for UAV-based referring image segmentation (UAV-RIS), featuring:

  • VLCAM: Vision-Language Cross-Attention Module for robust multimodal grounding.
  • RAMSF: Rotation-Aware Multi-Scale Fusion decoder for aerial scenes with large scale/rotation variance.

πŸ“¦ What’s in this repo?

Pre-trained model weights (PyTorch .pth) for:

  • UAVid-RIS (best checkpoint)
  • VDD-RIS (best checkpoint)

All training/inference code lives in the GitHub repo.


πŸš€ How to use these weights

1) Download with huggingface_hub

2) Load into the official implementation

Follow the loading utilities and model definitions from the GitHub repo:

Example (UAVid-RIS testing) β€” run in the code repo:

python test.py --swin_type base --dataset uavid_ris \
  --resume /path/to/model_best_AeroReformer.pth \
  --model_id AeroReformer --split test --workers 4 --window12 \
  --img_size 480 --refer_data_root ./data/UAVid_RIS/ --mha 4-4-4-4

βœ… Recommended environments

  • Python: 3.10
  • PyTorch: 2.3.1
  • CUDA: 12.4 (or adapt to your setup)
  • OS: Ubuntu (verified by authors)

Exact dependency setup and requirements.txt are maintained in the GitHub repo.


πŸ“‚ Datasets

Experiments use UAVid-RIS and VDD-RIS. Referring expressions were generated by Qwen and LLaMA and may include errors/inconsistencies.

Preprocessing and split scripts are provided in the GitHub repo.


⚠️ Notes & Limitations

  • Checkpoints here are frozen artifacts; for training/fine-tuning or code changes, use the GitHub repo.
  • Performance depends on preprocessing, image size, and split strategy; follow the original scripts for reproducibility.
  • Auto-generated text expressions may contain ambiguity/noise.

πŸ™ Acknowledgements

AeroReformer builds upon:


πŸ“œ Citation

@article{AeroReformer2025,
  title   = {AeroReformer: Aerial Referring Transformer for UAV-Based Referring Image Segmentation},
  author  = {<Author List>},
  journal = {arXiv preprint arXiv:2502.16680},
  year    = {2025}
}

πŸ” License

Unless stated otherwise, weights are distributed under MIT. Please also respect the licenses/terms of the underlying datasets and the Swin-Transformer weights.


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support