DGTRS-CLIP-ViT-B-16

This is the DGTRS-CLIP-ViT-B-16 model. It can be used for a variety of tasks, including zero-shot image classification and text-image retrieval.

This model is compatible with both the transformers and diffusers libraries.

How to use

With transformers

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("BiliSakura/DGTRS-CLIP-ViT-B-16")
processor = CLIPProcessor.from_pretrained("BiliSakura/DGTRS-CLIP-ViT-B-16")

# Your code here to use the model for image-text similarity, zero-shot classification, etc.

With diffusers

This model's text encoder can be used with Stable Diffusion:

# Your code here to use the text encoder with a diffusion model.

Citation

If you use this model in your research, please cite the original paper:

@article{chen2025lrsclip,
  title={LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text},
  author={Chen, Weizhi and Chen, Jingbo and Deng, Yupeng and Chen, Jiansheng and Feng, Yuman and Xi, Zhihao and Liu, Diyou and Li, Kai and Meng, Yu},
  journal={arXiv preprint arXiv:2503.19311},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support