# Model Card for CLIC-ViT-B-32-224-PixPr-RedCaps ## Model Details - **Model-details:** : Fine-tuned with CLIC using Pixelprose dataset ## Model Usage ### With OpenCLIP ``` import torch from PIL import Image import open_clip model, _, image_processor = open_clip.create_model_and_transforms('hf-hub:nmndeep/CLIC-ViT-B-32-224-PixPr-RedCaps') image = image_processor(Image.open(urlopen( 'https://images.pexels.com/photos/869258/pexels-photo-869258.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1'))).unsqueeze(0) model.eval() tokenizer = open_clip.get_tokenizer('hf-hub:nmndeep/CLIC-ViT-B-32-224-PixPr-RedCaps') texts= ["a diagram", "a dog", "a cat", "snow"] text = tokenizer(texts) with torch.no_grad(), torch.autocast("cuda"): image_features = model.encode_image(image) text_features = model.encode_text(text) image_features /= image_features.norm(dim=-1, keepdim=True) text_features /= text_features.norm(dim=-1, keepdim=True) text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1) idx = torch.argmax(text_probs) print("Output label:", texts[idx]) ```