Contrastive Booru-Image Embedding Model
The ViT-B model was replaced by another Encoder.
The images were tagged by the CL tagger. The caption (prompt) was tokenized by ModernBERT.
Objectives
Image-Caption Similarity Scoring: The model assigns a score to each image based on how closely it is related to its associated caption.
Image Labeling: The image embedding is used to label images by character name.
Updates
This version contains up to 4000 female characters extracted from public and private datasets.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support