Update README.md
Browse files
README.md
CHANGED
|
@@ -6,3 +6,36 @@ library_name: timm
|
|
| 6 |
license: cc-by-nc-4.0
|
| 7 |
---
|
| 8 |
# Model card for vit_large_patch14_clip_336.laion2b_ft_augreg_inat21
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
license: cc-by-nc-4.0
|
| 7 |
---
|
| 8 |
# Model card for vit_large_patch14_clip_336.laion2b_ft_augreg_inat21
|
| 9 |
+
Part of a series of `timm` fine-tune experiments on iNaturalist 2021 competition data (https://github.com/visipedia/inat_comp/tree/master/2021) for higher capacity models.
|
| 10 |
+
|
| 11 |
+
Covering 10,000 species, this dataset and these models are fun to explore via the classification widget with pictures from your backyard, but quite a bit smaller than models you can find on iNaturalist website (https://www.inaturalist.org/blog/75633-a-new-computer-vision-model-v2-1-including-1-770-new-taxa).
|
| 12 |
+
|
| 13 |
+
No extra meta-data was used for training these models (as was the case for the competition), it was a straightfoward fine-tune to explore differences in model pretrain data.
|
| 14 |
+
|
| 15 |
+
| Model | Top-1 | Top-5 | Img Size (Train) | Paper |
|
| 16 |
+
|-------|-------|-------|----------|-------|
|
| 17 |
+
| [eva02_large_patch14_clip_336.merged2b_ft_inat21](https://huggingface.co/timm/eva02_large_patch14_clip_336.merged2b_ft_inat21) | 92.05 | 98.01 | 336 | https://arxiv.org/abs/2303.11331 |
|
| 18 |
+
| [vit_large_patch14_clip_336.datacompxl_ft_augreg_inat21](https://huggingface.co/timm/vit_large_patch14_clip_336.datacompxl_ft_augreg_inat21) | 91.98 | 98.03 | 336 | https://arxiv.org/abs/2304.14108 |
|
| 19 |
+
| [vit_large_patch14_clip_336.laion2b_ft_augreg_inat21](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_augreg_inat21) | 91.48 | 97.89 | 336 | https://arxiv.org/abs/2212.07143 |
|
| 20 |
+
| [convnext_large_mlp.laion2b_ft_augreg_inat21](https://huggingface.co/timm/convnext_large_mlp.laion2b_ft_augreg_inat21) | 90.95 | 97.68 | 448 (384) | |
|
| 21 |
+
| [vit_large_patch14_clip_336.datacompxl_ft_inat21](https://huggingface.co/timm/vit_large_patch14_clip_336.datacompxl_ft_inat21) | 90.85 | 97.68 | 336 | https://arxiv.org/abs/2304.14108 |
|
| 22 |
+
| [convnext_large_mlp.laion2b_ft_augreg_inat21](https://huggingface.co/timm/convnext_large_mlp.laion2b_ft_augreg_inat21) | 90.62 | 97.61 | 384 | |
|
| 23 |
+
| [vit_large_patch14_clip_336.laion2b_ft_in12k_in1k_inat21](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k_inat21) | 90.29 | 97.44 | 336 | https://arxiv.org/abs/2212.07143 |
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
## Run Validation
|
| 27 |
+
```
|
| 28 |
+
python validate.py /tfds/ --dataset tfds/i_naturalist2021 --model hf-hub:timm/vit_large_patch14_clip_336.laion2b_ft_augreg_inat21 --split val --amp
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
## Citation
|
| 32 |
+
|
| 33 |
+
```bibtex
|
| 34 |
+
@inproceedings{cherti2023reproducible,
|
| 35 |
+
title={Reproducible scaling laws for contrastive language-image learning},
|
| 36 |
+
author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
|
| 37 |
+
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
|
| 38 |
+
pages={2818--2829},
|
| 39 |
+
year={2023}
|
| 40 |
+
}
|
| 41 |
+
```
|