birder-project
/

rope_vit_reg4_b14_capi-imagenet21k

Image Feature Extraction

Model card Files Files and versions

hassonofer commited on May 10

Commit

baf07b9

·

verified ·

1 Parent(s): 3656e2d

Update README.md

Files changed (1) hide show

README.md +20 -0

README.md CHANGED Viewed

@@ -15,6 +15,26 @@ datasets:
 A RoPE ViT image classification model. The model follows a two-stage training process: first, CAPI pretraining, then fine-tuned on the `ImageNet-21K` dataset.
 ## Model Details
 - **Model Type:** Image classification and detection backbone

 A RoPE ViT image classification model. The model follows a two-stage training process: first, CAPI pretraining, then fine-tuned on the `ImageNet-21K` dataset.
+## RoPE Configuration
+This model implements EVA-style Rotary Position Embedding (RoPE). When working with resolutions different from the training resolution (224x224), the model behavior can be optimized by configuring the `pt_grid_size` parameter:
+- For inference at higher resolutions or when performing "shallow" fine-tuning, it's recommended to explicitly set `pt_grid_size=(16, 16)` (the default grid size during pretraining).
+- For aggressive fine-tuning at higher resolutions, leave `pt_grid_size` as `None` to allow the model to adapt to the new resolution.
+Setting `pt_grid_size` during inference:
+```sh
+# When running inference with a custom resolution (e.g., 336x336)
+python predict.py --network rope_vit_reg4_b14 -t capi-imagenet21k --model-config '{"pt_grid_size":[16, 16]}' --size 336 ...
+```
+Converting the model with explicit RoPE configuration:
+```sh
+python tool.py convert-model --network rope_vit_reg4_b14 -t capi-imagenet21k --add-config '{"pt_grid_size":[16, 16]}'
+```
 ## Model Details
 - **Model Type:** Image classification and detection backbone