DGTRS-CLIP-ViT-B-16
This is the DGTRS-CLIP-ViT-B-16 model. It can be used for a variety of tasks, including zero-shot image classification and text-image retrieval.
This model is compatible with both the transformers and diffusers libraries.
How to use
With transformers
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("BiliSakura/DGTRS-CLIP-ViT-B-16")
processor = CLIPProcessor.from_pretrained("BiliSakura/DGTRS-CLIP-ViT-B-16")
# Your code here to use the model for image-text similarity, zero-shot classification, etc.
With diffusers
This model's text encoder can be used with Stable Diffusion:
# Your code here to use the text encoder with a diffusion model.
Citation
If you use this model in your research, please cite the original paper:
@article{chen2025lrsclip,
title={LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text},
author={Chen, Weizhi and Chen, Jingbo and Deng, Yupeng and Chen, Jiansheng and Feng, Yuman and Xi, Zhihao and Liu, Diyou and Li, Kai and Meng, Yu},
journal={arXiv preprint arXiv:2503.19311},
year={2025}
}