|
|
--- |
|
|
license: mit |
|
|
base_model: |
|
|
- mistralai/Pixtral-12B-2409 |
|
|
pipeline_tag: image-text-to-text |
|
|
library_name: transformers |
|
|
tags: |
|
|
- lora |
|
|
datasets: |
|
|
- Multimodal-Fatima/FGVC_Aircraft_train |
|
|
- takara-ai/FloodNet_2021-Track_2_Dataset_HF |
|
|
--- |
|
|
|
|
|
<img src="https://takara.ai/images/logo-24/TakaraAi.svg" width="200" alt="Takara.ai Logo" /> |
|
|
From the Frontier Research Team at **Takara.ai** we present a specialized LoRA adapter for aerial imagery analysis and visual question answering. |
|
|
|
|
|
--- |
|
|
|
|
|
# pixtral_aerial_VQA_adapter |
|
|
|
|
|
## Overview |
|
|
This repository contains a fine-tuned LoRA adapter for the Pixtral-12B model, optimized specifically for aerial imagery analysis and visual question answering. The adapter enables detailed processing of aerial footage with a focus on construction site surveying, structural assessment, and environmental monitoring. |
|
|
|
|
|
## Model Details |
|
|
- **Type**: LoRA Adapter |
|
|
- **Total Parameters**: 6,225,920 |
|
|
- **Memory Usage**: 23.75 MB |
|
|
- **Precisions**: torch.float32 |
|
|
- **Layer Types**: |
|
|
- lora_A: 40 |
|
|
- lora_B: 40 |
|
|
- **Base Model**: [mistralai/Pixtral-12B-2409](https://huggingface.co/mistralai/Pixtral-12B-2409) |
|
|
|
|
|
## Capabilities |
|
|
The adapter enhances Pixtral's ability to: |
|
|
- Identify and describe construction elements in aerial imagery |
|
|
- Detect structural issues in buildings and infrastructure |
|
|
- Analyze progress in construction projects |
|
|
- Monitor environmental changes and flooding events |
|
|
- Process high-resolution aerial imagery with improved detail recognition |
|
|
|
|
|
## Intended Use |
|
|
- **Primary intended uses**: Processing aerial footage of construction sites for structural and construction surveying. |
|
|
- Can also be applied to any detailed VQA use cases with aerial footage. |
|
|
- Suitable for disaster response and assessment applications, particularly flood monitoring. |
|
|
|
|
|
## Training Data |
|
|
- **Dataset**: |
|
|
1. [FloodNet Track 2 dataset](https://huggingface.co/datasets/takara-ai/FloodNet_2021-Track_2_Dataset_HF) |
|
|
2. Subset of [FGVC Aircraft dataset](https://huggingface.co/datasets/Multimodal-Fatima/FGVC_Aircraft_train) |
|
|
3. Custom dataset of 10 image-caption pairs created using Pixtral |
|
|
|
|
|
## Training Procedure |
|
|
- **Training method**: LoRA (Low-Rank Adaptation) |
|
|
- **Base model**: Ertugrul/Pixtral-12B-Captioner-Relaxed |
|
|
- **Training hardware**: Nebius-hosted NVIDIA H100 machine |
|
|
|
|
|
## Usage Example |
|
|
|
|
|
```python |
|
|
from transformers import AutoProcessor, AutoModelForCausalLM |
|
|
import torch |
|
|
from PIL import Image |
|
|
|
|
|
# Load model and processor |
|
|
model_id = "takara-ai/pixtral_aerial_VQA_adapter" |
|
|
processor = AutoProcessor.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
torch_dtype=torch.float16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Load and process image |
|
|
image = Image.open("path_to_aerial_image.jpg") |
|
|
prompt = "Describe the construction progress visible in this aerial image." |
|
|
|
|
|
# Generate response |
|
|
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device) |
|
|
generated_ids = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=512, |
|
|
do_sample=True, |
|
|
temperature=0.7 |
|
|
) |
|
|
response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{rahnemoonfar2020floodnet, |
|
|
title={FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding}, |
|
|
author={Maryam Rahnemoonfar and Tashnim Chowdhury and Argho Sarkar and Debvrat Varshney and Masoud Yari and Robin Murphy}, |
|
|
year={2020}, |
|
|
eprint={2012.02951}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
doi={10.48550/arXiv.2012.02951} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
For research inquiries and press, please reach out to [email protected] |
|
|
|
|
|
> 人類を変革する |