--- license: mit --- # **AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation** 🚀 **AeroReformer** is a novel framework for **UAV-based referring image segmentation (RIS)**, designed to address the unique challenges of aerial imagery, such as complex spatial scales, occlusions, and varying object orientations. Our method integrates **multi-head vision-language fusion (MHVLFM)** and **multi-scale rotation-aware fusion (MSRAFM)** to achieve superior segmentation performance compared to existing RIS approaches. The datasets and code will be made publicly available at our **[GitHub repository](https://github.com/lironui/AeroReformer)**. --- ## **📝 Paper Status** Our research paper detailing **AeroReformer** is currently in preparation and will be released soon. Stay tuned for updates! --- ## **📌 Model Overview** **AeroReformer** is a **transformer-based vision-language model** designed for **referring segmentation in UAV imagery**. It automatically **localizes and segments objects** based on **natural language descriptions**, overcoming the limitations of existing RIS models in aerial datasets. ### **🔹 Key Features** ✅ **Automatic Annotation Pipeline**: Utilizes open-source UAV segmentation datasets and large language models (LLMs) to generate textual descriptions. ✅ **Multi-Head Vision-Language Fusion (MHVLFM)**: Enhances cross-modal understanding for precise segmentation. ✅ **Multi-Scale Rotation-Aware Fusion (MSRAFM)**: Improves robustness to aerial scene variations. ✅ **State-of-the-Art Performance**: Sets a new benchmark in UAV-based referring segmentation on multiple datasets. ---