Image Feature Extraction
Birder
PyTorch
hassonofer commited on
Commit
baf07b9
·
verified ·
1 Parent(s): 3656e2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -15,6 +15,26 @@ datasets:
15
 
16
  A RoPE ViT image classification model. The model follows a two-stage training process: first, CAPI pretraining, then fine-tuned on the `ImageNet-21K` dataset.
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## Model Details
19
 
20
  - **Model Type:** Image classification and detection backbone
 
15
 
16
  A RoPE ViT image classification model. The model follows a two-stage training process: first, CAPI pretraining, then fine-tuned on the `ImageNet-21K` dataset.
17
 
18
+ ## RoPE Configuration
19
+
20
+ This model implements EVA-style Rotary Position Embedding (RoPE). When working with resolutions different from the training resolution (224x224), the model behavior can be optimized by configuring the `pt_grid_size` parameter:
21
+
22
+ - For inference at higher resolutions or when performing "shallow" fine-tuning, it's recommended to explicitly set `pt_grid_size=(16, 16)` (the default grid size during pretraining).
23
+ - For aggressive fine-tuning at higher resolutions, leave `pt_grid_size` as `None` to allow the model to adapt to the new resolution.
24
+
25
+ Setting `pt_grid_size` during inference:
26
+
27
+ ```sh
28
+ # When running inference with a custom resolution (e.g., 336x336)
29
+ python predict.py --network rope_vit_reg4_b14 -t capi-imagenet21k --model-config '{"pt_grid_size":[16, 16]}' --size 336 ...
30
+ ```
31
+
32
+ Converting the model with explicit RoPE configuration:
33
+
34
+ ```sh
35
+ python tool.py convert-model --network rope_vit_reg4_b14 -t capi-imagenet21k --add-config '{"pt_grid_size":[16, 16]}'
36
+ ```
37
+
38
  ## Model Details
39
 
40
  - **Model Type:** Image classification and detection backbone