BiVLC
					Collection
				
BIVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
					• 
				16 items
				• 
				Updated
					
				
CLIP_TROHN-Text is a model presented in the BiVLC paper for experimentation. It has been fine-tuned with OpenCLIP framework using as basis the CLIP ViT-B-32 model pre-trained by 'openai'. The idea behind this fine-tuning is to improve the compositional understanding of the model by adding negative captions. The negatives present small compositional changes. Hyperparameters:
The model is evaluated in BiVLC.
This work is licensed under a MIT License.
If you find this dataset useful, please consider citing our paper:
@misc{miranda2024bivlc,
      title={BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval}, 
      author={Imanol Miranda and Ander Salaberria and Eneko Agirre and Gorka Azkune},
      year={2024},
      eprint={2406.09952},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}