Update README add tokenizer/vocab/preprocessor cfg

Files changed (6) hide show

README.md +19 -8
preprocessor_config.json +19 -0
special_tokens_map.json +1 -0
tokenizer.json +0 -0
tokenizer_config.json +1 -0
vocab.json +0 -0

README.md CHANGED Viewed

@@ -6,11 +6,12 @@ license: mit
 #  Table of Contents
 1. [Model Details](#model-details)
-1. [Uses](#uses)
-1. [Training Details](#training-details)
-1. [Evaluation](#evaluation)
-1. [Citation](#citation)
-1. [How To Get Started With the Model](#how-to-get-started-with-the-model)
 # Model Details
@@ -19,9 +20,11 @@ license: mit
 A CLIP ViT-g/14 model trained with the LAION-2B English subset of LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
 # Uses
-As per the original OpenAI CLIP models, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model.
 The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog (https://laion.ai/blog/laion-5b/) and upcoming paper include additional discussion as it relates specifically to the training dataset.
@@ -55,7 +58,7 @@ This model was trained with the 2 Billion sample English subset of LAION-5B (htt
 ## Training Procedure
-**TODO** - add SLURM script, hparams.
 # Evaluation
@@ -71,7 +74,15 @@ The testing is performed with VTAB+ (A combination of VTAB (https://arxiv.org/ab
 ## Results
-**TODO** - full zero-shot and retrieval benchmark results
 # Citation

 #  Table of Contents
 1. [Model Details](#model-details)
+2. [Uses](#uses)
+3. [Training Details](#training-details)
+4. [Evaluation](#evaluation)
+5. [Acknowledgements](#acknowledgements)
+6. [Citation](#citation)
+7. [How To Get Started With the Model](#how-to-get-started-with-the-model)
 # Model Details
 A CLIP ViT-g/14 model trained with the LAION-2B English subset of LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
+Model training done by Romain Beaumont on the [stability.ai](https://stability.ai/) cluster.
 # Uses
+As per the original [OpenAI CLIP model card](https://github.com/openai/CLIP/blob/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1/model-card.md), this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model.
 The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog (https://laion.ai/blog/laion-5b/) and upcoming paper include additional discussion as it relates specifically to the training dataset.
 ## Training Procedure
+Please see [training notes](https://docs.google.com/document/d/1EFbMLRWSSV0LUf9Du1pWzWqgeiIRPwEWX2s1C6mAk5c) and [wandb logs](https://wandb.ai/rom1504/eval_openclip/reports/slow-g-14--VmlldzoyNTMwMjg5).
 # Evaluation
 ## Results
+The model achieves a 76.6 zero-shot top-1 accuracy on ImageNet-1k.
+An initial round of benchmarks have been performed on a wider range of datasets, currently viewable at https://github.com/LAION-AI/CLIP_benchmark/blob/main/benchmark/results.ipynb
+**TODO** - create table for just this model's metrics.
+# Acknowledgements
+Acknowledging [stability.ai](https://stability.ai/) for the compute used to train this model.
 # Citation

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+  "crop_size": 224,
+  "do_center_crop": true,
+  "do_normalize": true,
+  "do_resize": true,
+  "feature_extractor_type": "CLIPFeatureExtractor",
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "resample": 3,
+  "size": 224
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"bos_token": {"content": "<\|startoftext\|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "eos_token": {"content": "<\|endoftext\|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "unk_token": {"content": "<\|endoftext\|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "pad_token": "<\|endoftext\|>"}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"unk_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "<|startoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": "<|endoftext|>", "add_prefix_space": false, "errors": "replace", "do_lower_case": true, "name_or_path": "./clip_ViT_B_32/"}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff