Update README.md
Browse files
README.md
CHANGED
|
@@ -8,4 +8,33 @@ tags:
|
|
| 8 |
- lora
|
| 9 |
datasets:
|
| 10 |
- Multimodal-Fatima/FGVC_Aircraft_train
|
| 11 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
- lora
|
| 9 |
datasets:
|
| 10 |
- Multimodal-Fatima/FGVC_Aircraft_train
|
| 11 |
+
---
|
| 12 |
+
# pixtral_aerial_VQA_adapter
|
| 13 |
+
|
| 14 |
+
## Model Details
|
| 15 |
+
|
| 16 |
+
- **Type**: LoRA Adapter
|
| 17 |
+
- **Total Parameters**: 6,225,920
|
| 18 |
+
- **Memory Usage**: 23.75 MB
|
| 19 |
+
- **Precisions**: torch.float32
|
| 20 |
+
- **Layer Types**:
|
| 21 |
+
- lora_A: 40
|
| 22 |
+
- lora_B: 40
|
| 23 |
+
|
| 24 |
+
## Intended Use
|
| 25 |
+
|
| 26 |
+
- **Primary intended uses**: Processing aerial footage of construction sites for structural and construction surveying.
|
| 27 |
+
- Can also be applied to any detailed VQA use cases with aerial footage.
|
| 28 |
+
|
| 29 |
+
## Training Data
|
| 30 |
+
|
| 31 |
+
- **Dataset**:
|
| 32 |
+
1. FloodNet Track 2 dataset
|
| 33 |
+
2. Subset of FGVC Aircraft dataset
|
| 34 |
+
3. Custom dataset of 10 image-caption pairs created using Pixtral
|
| 35 |
+
|
| 36 |
+
## Training Procedure
|
| 37 |
+
|
| 38 |
+
- **Training method**: LoRA (Low-Rank Adaptation)
|
| 39 |
+
- **Base model**: Ertugrul/Pixtral-12B-Captioner-Relaxed
|
| 40 |
+
- **Training hardware**: Nebius-hosted NVIDIA H100 machine
|