Update README.md
Browse files
README.md
CHANGED
|
@@ -15,6 +15,26 @@ datasets:
|
|
| 15 |
|
| 16 |
A RoPE ViT image classification model. The model follows a two-stage training process: first, CAPI pretraining, then fine-tuned on the `ImageNet-21K` dataset.
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
## Model Details
|
| 19 |
|
| 20 |
- **Model Type:** Image classification and detection backbone
|
|
|
|
| 15 |
|
| 16 |
A RoPE ViT image classification model. The model follows a two-stage training process: first, CAPI pretraining, then fine-tuned on the `ImageNet-21K` dataset.
|
| 17 |
|
| 18 |
+
## RoPE Configuration
|
| 19 |
+
|
| 20 |
+
This model implements EVA-style Rotary Position Embedding (RoPE). When working with resolutions different from the training resolution (224x224), the model behavior can be optimized by configuring the `pt_grid_size` parameter:
|
| 21 |
+
|
| 22 |
+
- For inference at higher resolutions or when performing "shallow" fine-tuning, it's recommended to explicitly set `pt_grid_size=(16, 16)` (the default grid size during pretraining).
|
| 23 |
+
- For aggressive fine-tuning at higher resolutions, leave `pt_grid_size` as `None` to allow the model to adapt to the new resolution.
|
| 24 |
+
|
| 25 |
+
Setting `pt_grid_size` during inference:
|
| 26 |
+
|
| 27 |
+
```sh
|
| 28 |
+
# When running inference with a custom resolution (e.g., 336x336)
|
| 29 |
+
python predict.py --network rope_vit_reg4_b14 -t capi-imagenet21k --model-config '{"pt_grid_size":[16, 16]}' --size 336 ...
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
Converting the model with explicit RoPE configuration:
|
| 33 |
+
|
| 34 |
+
```sh
|
| 35 |
+
python tool.py convert-model --network rope_vit_reg4_b14 -t capi-imagenet21k --add-config '{"pt_grid_size":[16, 16]}'
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
## Model Details
|
| 39 |
|
| 40 |
- **Model Type:** Image classification and detection backbone
|