SC4001-Flowers102
Collection
Dataset and models for SC4001 Research Project
•
4 items
•
Updated
Qwen3-VL classifier checkpoint at step 1700, trained on Flowers102 dataset using the fine-tuned oscarqjh/Qwen3-VL-4B-Instruct-Flowers102-Open-QA model. This model adds a classification head for 102 flower categories on top of an already fine-tuned base model.
This model is a fine-tuned version of oscarqjh/Qwen3-VL-4B-Instruct-Flowers102-Open-QA for flower classification on the Flowers102 dataset.
This is a multimodal classifier that combines:
The model takes both image and text inputs (questions about flowers) and outputs classification predictions.
from transformers import AutoProcessor
from src.models.qwen_classifier import Qwen3VLClassifier
import torch
from PIL import Image
# Load model and processor
model = Qwen3VLClassifier.from_pretrained("your-username/model-name")
processor = AutoProcessor.from_pretrained("oscarqjh/Qwen3-VL-4B-Instruct-Flowers102-Open-QA")
# Prepare inputs
image = Image.open("flower.jpg")
text = "What type of flower is this?"
# Create chat format
messages = [{
'role': 'user',
'content': [
{'type': 'image', 'image': image},
{'type': 'text', 'text': text}
]
}]
# Process inputs
text_input = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
inputs = processor(text=[text_input], images=[image], return_tensors="pt", padding=True)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=-1).item()
confidence = torch.softmax(outputs.logits, dim=-1).max().item()
print(f"Predicted class: {predicted_class}, Confidence: {confidence:.4f}")
The model was trained on the Flowers102 dataset, which contains 102 flower categories with:
If you use this model, please cite the original Qwen3-VL paper and the Flowers102 dataset.
Base model
Qwen/Qwen3-VL-4B-Instruct