README.md CHANGED
@@ -1,3 +1,432 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license:
3
+ - mit
4
+ language:
5
+ - en
6
+ library_name: open_clip
7
+ tags:
8
+ - biology
9
+ - CV
10
+ - images
11
+ - imageomics
12
+ - clip
13
+ - species-classification
14
+ - biological visual task
15
+ - multimodal
16
+ - animals
17
+ - species
18
+ - taxonomy
19
+ - rare species
20
+ - endangered species
21
+ - evolutionary biology
22
+ - knowledge-guided
23
+ - zero-shot-image-classification
24
+ datasets:
25
+ - imageomics/TreeOfLie-200M
26
+ - GBIF
27
+ - bioscan-ml/BIOSCAN-5M
28
+ - EOL
29
+ - FathomNet
30
+ ---
31
+
32
+ <!--
33
+ Image with caption (jpg or png):
34
+ |![Figure #](https://huggingface.co/imageomics/<model-repo>/resolve/main/<filepath>)|
35
+ |:--|
36
+ |**Figure #.** [Image of <>](https://huggingface.co/imageomics/<model-repo>/raw/main/<filepath>) <caption description>.|
37
+ -->
38
+
39
+ <!--
40
+ Notes on styling:
41
+
42
+ To render LaTex in your README, wrap the code in `\\(` and `\\)`. Example: \\(\frac{1}{2}\\)
43
+
44
+ Escape underscores ("_") with a "\". Example: image\_RGB
45
+ -->
46
+
47
+ # Model Card for BioCLIP 2
48
+
49
+ BioCLIP 2 is a foundation model for biology organismal images. It is trained on [TreeOfLife-200M](https://huggingface.co/datasets/imageomics/TreeOfLife-200M) on the basis of a [CLIP](https://huggingface.co/laion/CLIP-ViT-L-14-laion2B-s32B-b82K) model (ViT-14/L) pre-trained on LAION-2B.
50
+ BioCLIP 2 yields state-of-the-art performance in recognizing various species. More importantly, it demonstrates emergent properties beyond species classification after extensive hierarchical contrastive training.
51
+
52
+ ## Model Details
53
+
54
+ ### Model Description
55
+
56
+ Foundation models trained at scale exhibit emergent properties beyond their initial training objectives.
57
+ BioCLIP 2 demonstrates such emergence beyond species classification by scaling up the hierarchical contrastive training proposed by [BioCLIP](https://imageomics.github.io/bioclip/).
58
+ The model is trained on [TreeOfLife-200M](https://huggingface.co/datasets/imageomics/TreeOfLife-200M) (the largest and most diverse available dataset of biology images).
59
+ We evaluate BioCLIP 2 on a diverse set of biological tasks. Through training at scale, BioCLIP 2 improves species classification by 18.1% over BioCLIP. More importantly, we demonstrate that BioCLIP 2 generalizes to diverse biological questions beyond species classification solely through species-level supersvision. Further analysis reveals that BioCLIP 2 acquires two emergent properties through scaling up hierarchical contrastive learning: inter-species ecological alignment and intra-species variation separation.
60
+
61
+ - **Developed by:** Jianyang Gu, Samuel Stevens, Elizabeth G Campolongo, Matthew J Thompson, Net Zhang, Jiaman Wu, Andrei Kopanev, Zheda Mai, Alexander E. White, James Balhoff, Wasila M Dahdul, Daniel Rubenstein, Hilmar Lapp, Tanya Berger-Wolf, Wei-Lun Chao, and Yu Su
62
+ - **Model type:** The model uses a ViT-L/14 Transformer as an image encoder and uses a masked self-attention Transformer as a text encoder.
63
+ - **License:** MIT
64
+ - **Fine-tuned from model:** CLIP pre-trained on LAION-2B, ViT-L/14 ([Model weight](https://huggingface.co/laion/CLIP-ViT-L-14-laion2B-s32B-b82K))
65
+
66
+ ### Model Sources
67
+
68
+ <!-- Provide the basic links for the model. -->
69
+
70
+ - **Repository:** [BioCLIP 2](https://github.com/Imageomics/bioclip-2)
71
+ - **Paper:** [BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning]()
72
+ - **Demo:** Coming soon
73
+
74
+ ## Uses
75
+
76
+ ### Direct Use
77
+
78
+ The model can be used for zero-shot classification provided the species names.
79
+ It can also be used for few-shot classification with some images serving as the support set.
80
+ Additionally, it is also recommended to use BioCLIP 2 as a visual encoder for other biological visual tasks.
81
+
82
+ ## Bias, Risks, and Limitations
83
+
84
+ BioCLIP 2 is trained on an imbalanced dataset. Specifically, the TreeOfLife-200M dataset exhibits a long-tailed distribution across taxa.
85
+ Therefore, the predictions of BioCLIP 2 might be biased toward well-represented species. For more details, see the [discussion in the TreeOfLife-200M dataset card](https://huggingface.co/datasets/imageomics/TreeOfLife-200M#considerations-for-using-the-data).
86
+
87
+ BioCLIP 2 and TreeOfLife-200M provide great potential to improve and enhance existing conservation efforts, in particular by facilitating recognition of threatened species.
88
+ Unfortunately, as with many open-source efforts to further conservation goals, there is also potential for bad actors to make use of these tools for malicious purposes. Though the improvement on threatened species could make it easier for poachers to identify protected species, these types of tools are a force-multiplier to monitor illicit trade and sales of these same species. The primary risk to endangered species comes from disclosure of precise location information rather than improved classification capability. Our data does not provide geo-tagged information of the organisms included, minimizing the vulnerabilities that could be used in poaching.
89
+
90
+ <!--
91
+ ### Recommendations
92
+
93
+ This section is meant to convey recommendations with respect to the bias, risk, and technical limitations.
94
+
95
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
96
+ -->
97
+
98
+ ## How to Get Started with the Model
99
+
100
+ You can use the `open_clip` library to load BioCLIP 2.
101
+
102
+ ```
103
+ import open_clip
104
+ model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:imageomics/bioclip-2')
105
+ tokenizer = open_clip.get_tokenizer('hf-hub:imageomics/bioclip-2')
106
+ ```
107
+
108
+ ## Training Details
109
+
110
+ ### Training Data
111
+
112
+ The model was trained with [TreeOfLife-200M](https://huggingface.co/datasets/imageomics/TreeOfLife-200M).
113
+ The dataset consists of nearly 214M images covering 952K taxa.
114
+ The scale of TreeOfLife-200M fosters the emergent properties of BioCLIP 2.
115
+
116
+ In addition, we also used a subset of LAION-2B that consists of 26M samples for experience replay.
117
+ This part of data was downloaded from the first three parquet metadata files of LAION-2B, and the first 800 tar files were used.
118
+
119
+ ### Training Procedure
120
+
121
+ #### Preprocessing
122
+
123
+ Standard CLIP image preprocessing is adopted in the training.
124
+
125
+ #### Training Hyperparameters
126
+
127
+ - **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
128
+
129
+ We used an Adam optimizer with a maximum learning rate of 1e-4. 1,875 warming steps were adopted, followed by cosine decay.
130
+ The batch size of biological images was 2,816 per GPU, and that of replay data was 320 per GPU.
131
+ We trained the model on 32 GPUs for 30 epochs, with a weight decay of 0.2.
132
+ Each input image was resized to 224 x 224 resolution.
133
+
134
+ ## Evaluation
135
+
136
+ We evaluated the model on both species classification and other biological visual tasks.
137
+
138
+ ### Testing Data
139
+
140
+ For species classification tasks, we tested BioCLIP 2 on the following 10 tasks:
141
+ * [NABirds](https://dl.allaboutbirds.org/nabirds): We used 555 visual categories of 48,640 images for test.
142
+ * [Meta-Album](https://meta-album.github.io/): We used the Plankton, Insects, Insects2, PlantNet, Fungi, PlantVillage, and Medicinal Leaf datasets from Meta-Album.
143
+ * [IDLE-OO Camera Traps](https://huggingface.co/datasets/imageomics/IDLE-OO-Camera-Traps): Species identification in camera trap images is a real-world scenario that BioCLIP 2 can be applied to.
144
+ We collected a class-balanced test set from five LILA-BC camera trap datasets. For more information on this test set, please visit the [dataset page](https://huggingface.co/datasets/imageomics/IDLE-OO-Camera-Traps).
145
+ * [Rare Species](https://huggingface.co/datasets/imageomics/rare-species): This dataset was introduced in the first BioCLIP paper.
146
+ It consists of 400 species labeled Near Threatened through Extinct in the Wild by the [IUCN Red List](https://www.iucnredlist.org/), with 30 images per species.
147
+ Top-1 accuracy is reported for both zero-shot and few-shot experiments.
148
+
149
+ For biological visual tasks beyond species classification, we used:
150
+ * [FishNet](https://fishnet-2023.github.io/): We used the original training set (75,631) images to train a two-layer linear classifier on top of the extracted features to predict the feeding path and habitat labels.
151
+ Then we tested the classifier with 18,901 images from the test set. Accuracy is reported as the metric, where only predicting all the 9 labels correctly counts as success.
152
+ * [NeWT](https://github.com/visipedia/newt): We used the 164 binary classification tasks proposed in the dataset. Micro-accuracy is reported across all the samples.
153
+ * [AwA2](https://cvml.ista.ac.at/AwA2/): We used the original train-test split for attribute classification. Macro-F1 score is reported across all the attributes.
154
+ * [Herbarium19](https://www.kaggle.com/c/herbarium-2019-fgvc6): This is task to discover new species. We implement it as semi-supervised clustering. Clustering accuracy is calculated for the predictions on both seen and unseen classes.
155
+ * [PlantDoc](https://github.com/pratikkayal/PlantDoc-Dataset): 2,598 images of 13 plant species and up to 17 classes of diseases are included in this dataset. We conducted the experiment in a multi-fold 1-shot learning fashion. Average accuracy over the test samples is reported.
156
+
157
+ More details regarding the evaluation implementation can be referred to in the [paper]().
158
+
159
+ ### Results
160
+ We show the zero-shot classification and non-species classification task results here. For more detailed results, please check the [paper]().
161
+ <table cellpadding="0" cellspacing="0">
162
+ <thead>
163
+ <tr>
164
+ <th rowspan="2">Model</th>
165
+ <th colspan="5">Animals</th>
166
+ <th colspan="4">Plants & Fungi</th>
167
+ <th rowspan="2">Rare Species</th>
168
+ <th rowspan="2">Mean</th>
169
+ </tr>
170
+ <tr>
171
+ <th>NABirds</th>
172
+ <th>Plankton</th>
173
+ <th>Insects</th>
174
+ <th>Insects 2</th>
175
+ <th>Camera Trap</th>
176
+ <th>PlantNet</th>
177
+ <th>Fungi</th>
178
+ <th>PlantVillage</th>
179
+ <th>Med. Leaf</th>
180
+ </tr>
181
+ </thead>
182
+ <tbody>
183
+ <tr>
184
+ <td>CLIP (ViT-L/14)</td>
185
+ <td>66.5</td>
186
+ <td>1.3</td>
187
+ <td>9.0</td>
188
+ <td>11.7</td>
189
+ <td>29.5</td>
190
+ <td>61.7</td>
191
+ <td>7.6</td>
192
+ <td>6.5</td>
193
+ <td>25.6</td>
194
+ <td>35.2</td>
195
+ <td>25.5</td>
196
+ </tr>
197
+ <tr>
198
+ <td>SigLIP</td>
199
+ <td>61.7</td>
200
+ <td>2.4</td>
201
+ <td>27.3</td>
202
+ <td>20.7</td>
203
+ <td>33.7</td>
204
+ <td>81.8</td>
205
+ <td>36.9</td>
206
+ <td><b>28.5</b></td>
207
+ <td>54.5</td>
208
+ <td>47.6</td>
209
+ <td>39.5</td>
210
+ </tr>
211
+ <tr>
212
+ <td>BioTrove-CLIP</td>
213
+ <td>39.4</td>
214
+ <td>1.0</td>
215
+ <td>20.5</td>
216
+ <td>15.7</td>
217
+ <td>10.7</td>
218
+ <td>64.4</td>
219
+ <td>38.2</td>
220
+ <td>15.7</td>
221
+ <td>31.6</td>
222
+ <td>24.6</td>
223
+ <td>26.2</td>
224
+ </tr>
225
+ <tr>
226
+ <td>BioCLIP</td>
227
+ <td>58.8</td>
228
+ <td><b>6.1</b></td>
229
+ <td>34.9</td>
230
+ <td>20.5</td>
231
+ <td>31.7</td>
232
+ <td>88.2</td>
233
+ <td>40.9</td>
234
+ <td>19.0</td>
235
+ <td>38.5</td>
236
+ <td>37.1</td>
237
+ <td>37.6</td>
238
+ </tr>
239
+ <tr>
240
+ <td>BioCLIP 2</td>
241
+ <td><b>74.9</b></td>
242
+ <td>3.9</td>
243
+ <td><b>55.3</b></td>
244
+ <td><b>27.7</b></td>
245
+ <td><b>53.9</b></td>
246
+ <td><b>96.8</b></td>
247
+ <td><b>83.8</b></td>
248
+ <td>25.1</td>
249
+ <td><b>57.8</b></td>
250
+ <td><b>76.8</b></td>
251
+ <td><b>55.6</b></td>
252
+ </tr>
253
+ </tbody>
254
+ </table>
255
+
256
+ <table cellpadding="0" cellspacing="0">
257
+ <thead>
258
+ <tr>
259
+ <th rowspan="2">Model</th>
260
+ <th colspan="3">Animals</th>
261
+ <th colspan="2">Plants</th>
262
+ <th rowspan="2">Mean</th>
263
+ </tr>
264
+ <tr>
265
+ <th>FishNet</th>
266
+ <th>NeWT</th>
267
+ <th>AwA2</th>
268
+ <th>Herbarium19</th>
269
+ <th>PlantDoc</th>
270
+ </tr>
271
+ </thead>
272
+ <tbody>
273
+ <tr>
274
+ <td>CLIP (ViT-L/14)</td>
275
+ <td>27.9</td>
276
+ <td>83.4</td>
277
+ <td>61.6</td>
278
+ <td>18.2</td>
279
+ <td>22.3</td>
280
+ <td>42.7</td>
281
+ </tr>
282
+ <tr>
283
+ <td>SigLIP</td>
284
+ <td>31.9</td>
285
+ <td>83.2</td>
286
+ <td>67.3</td>
287
+ <td>18.6</td>
288
+ <td>28.2</td>
289
+ <td>45.8</td>
290
+ </tr>
291
+ <tr>
292
+ <td>Supervised-IN21K</td>
293
+ <td>29.4</td>
294
+ <td>75.8</td>
295
+ <td>52.7</td>
296
+ <td>14.9</td>
297
+ <td>25.1</td>
298
+ <td>39.6</td>
299
+ </tr>
300
+ <tr>
301
+ <td>DINOv2</td>
302
+ <td>37.4</td>
303
+ <td>83.7</td>
304
+ <td>48.6</td>
305
+ <td>28.1</td>
306
+ <td>38.6</td>
307
+ <td>47.3</td>
308
+ </tr>
309
+ <tr>
310
+ <td>BioTrove-CLIP</td>
311
+ <td>22.1</td>
312
+ <td>82.5</td>
313
+ <td>45.7</td>
314
+ <td>20.4</td>
315
+ <td>37.7</td>
316
+ <td>41.7</td>
317
+ </tr>
318
+ <tr>
319
+ <td>BioCLIP</td>
320
+ <td>30.1</td>
321
+ <td>82.7</td>
322
+ <td>65.9</td>
323
+ <td>26.8</td>
324
+ <td>39.5</td>
325
+ <td>49.0</td>
326
+ </tr>
327
+ <tr>
328
+ <td>BioCLIP 2</td>
329
+ <td><b>39.8</b></td>
330
+ <td><b>89.1</b></td>
331
+ <td><b>69.5</b></td>
332
+ <td><b>48.6</b></td>
333
+ <td><b>40.4</b></td>
334
+ <td><b>57.5</b></td>
335
+ </tr>
336
+ </tbody>
337
+ </table>
338
+
339
+ #### Summary
340
+
341
+ BioCLIP 2 surpasses BioCLIP by 18.0% on zero-shot species classification benchmarks.
342
+ More importantly, although the model is trained to discriminate different species, it also achieves the best performance on tasks beyond species classification.
343
+ Notably, BioCLIP 2 yields a 10.2% performance gap over DINOv2, which is broadly used for diverse visual tasks.
344
+
345
+ ## Model Examination
346
+
347
+ Please check Section 5.4 of our [paper](), where we provide formal analysis for the emergent properties of BioCLIP 2.
348
+
349
+ ## Technical Specifications
350
+
351
+ ### Compute Infrastructure
352
+ The training was performed on 32 NVIDIA H100-80GB GPUs distributed over 4 nodes on [Pittsburgh Supercomputing Center](https://www.psc.edu/)'s Bridges-2 Cluster.
353
+ It took 10 days to complete the training of 30 epochs.
354
+
355
+
356
+ ## Citation
357
+
358
+ **BibTeX:**
359
+ ```​
360
+ @software{Gu_BioCLIP_2_model,
361
+ author = {Jianyang Gu and Samuel Stevens and Elizabeth G Campolongo and Matthew J Thompson and Net Zhang and Jiaman Wu and Andrei Kopanev and Zheda Mai and Alexander E. White and James Balhoff and Wasila M Dahdul and Daniel Rubenstein and Hilmar Lapp and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
362
+ license = {MIT},
363
+ title = {{BioCLIP 2}},
364
+ url = {https://huggingface.co/imageomics/bioclip-2},
365
+ version = {1.0.0},
366
+ doi = {},
367
+ publisher = {Hugging Face},
368
+ year = {2025}
369
+ }
370
+ ```
371
+ Please also cite our paper:
372
+ ```
373
+ @article{gu2025bioclip,
374
+ title = {{B}io{CLIP} 2: Emergent Properties from Scaling Hierarchical Contrastive Learning},
375
+ author = {Jianyang Gu and Samuel Stevens and Elizabeth G Campolongo and Matthew J Thompson and Net Zhang and Jiaman Wu and Andrei Kopanev and Zheda Mai and Alexander E. White and James Balhoff and Wasila M Dahdul and Daniel Rubenstein and Hilmar Lapp and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
376
+ year = {2025},
377
+ eprint = {},
378
+ archivePrefix = {arXiv},
379
+ primaryClass = {cs.CV}
380
+ }
381
+ ```
382
+
383
+ Also consider citing OpenCLIP and BioCLIP:
384
+
385
+ ```
386
+ @software{ilharco_gabriel_2021_5143773,
387
+ author={Ilharco, Gabriel and Wortsman, Mitchell and Wightman, Ross and Gordon, Cade and Carlini, Nicholas and Taori, Rohan and Dave, Achal and Shankar, Vaishaal and Namkoong, Hongseok and Miller, John and Hajishirzi, Hannaneh and Farhadi, Ali and Schmidt, Ludwig},
388
+ title={OpenCLIP},
389
+ year={2021},
390
+ doi={10.5281/zenodo.5143773},
391
+ }
392
+ ```
393
+ Original BioCLIP Model:
394
+ ```
395
+ @software{bioclip2023,
396
+ author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M. Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
397
+ doi = {10.57967/hf/1511},
398
+ month = nov,
399
+ title = {BioCLIP},
400
+ version = {v0.1},
401
+ year = {2023}
402
+ }
403
+ ```
404
+ Original BioCLIP Paper:
405
+ ```
406
+ @inproceedings{stevens2024bioclip,
407
+ title = {{B}io{CLIP}: A Vision Foundation Model for the Tree of Life},
408
+ author = {Samuel Stevens and Jiaman Wu and Matthew J Thompson and Elizabeth G Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
409
+ booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
410
+ year = {2024},
411
+ pages = {19412-19424}
412
+ }
413
+ ```
414
+
415
+ ## Acknowledgements
416
+
417
+ This work was supported by the [Imageomics Institute](https://imageomics.org), which is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under [Award #2118240](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2118240) (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
418
+
419
+ <!-- ## Glossary -->
420
+
421
+ <!-- [optional] If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
422
+
423
+ <!-- ## More Information -->
424
+
425
+ <!-- [optional] Any other relevant information that doesn't fit elsewhere. -->
426
+
427
+ ## Model Card Authors
428
+
429
+ Jianyang Gu
430
+
431
+ ## Model Card Contact
432
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
open_clip_config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_cfg": {
3
+ "embed_dim": 768,
4
+ "vision_cfg": {
5
+ "image_size": 224,
6
+ "layers": 24,
7
+ "width": 1024,
8
+ "patch_size": 14
9
+ },
10
+ "text_cfg": {
11
+ "context_length": 77,
12
+ "vocab_size": 49408,
13
+ "width": 768,
14
+ "heads": 12,
15
+ "layers": 12
16
+ }
17
+ },
18
+ "preprocess_cfg": {
19
+ "mean": [
20
+ 0.48145466,
21
+ 0.4578275,
22
+ 0.40821073
23
+ ],
24
+ "std": [
25
+ 0.26862954,
26
+ 0.26130258,
27
+ 0.27577711
28
+ ],
29
+ "interpolation": "bicubic",
30
+ "resize_mode": "shortest"
31
+ }
32
+ }
open_clip_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b7b2bf6fbc95799e42630e394cf95803892ab447c1a8ab629dbc82fbeaf7dfef
3
+ size 1710517724
open_clip_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:035208dd616321f6c220384efaa1f10e9fcd8e1f2998ddefc88687718d186a85
3
+ size 1710639510
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "49406": {
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "49407": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ }
20
+ },
21
+ "bos_token": "<|startoftext|>",
22
+ "clean_up_tokenization_spaces": true,
23
+ "do_lower_case": true,
24
+ "eos_token": "<|endoftext|>",
25
+ "errors": "replace",
26
+ "model_max_length": 77,
27
+ "pad_token": "<|endoftext|>",
28
+ "tokenizer_class": "CLIPTokenizer",
29
+ "unk_token": "<|endoftext|>"
30
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff