Qwen3-VL Flowers102 Classifier

Qwen3-VL classifier checkpoint at step 1700, trained on Flowers102 dataset using the fine-tuned oscarqjh/Qwen3-VL-4B-Instruct-Flowers102-Open-QA model. This model adds a classification head for 102 flower categories on top of an already fine-tuned base model.

This model is a fine-tuned version of oscarqjh/Qwen3-VL-4B-Instruct-Flowers102-Open-QA for flower classification on the Flowers102 dataset.

Model Description

This is a multimodal classifier that combines:

  • Vision Encoder: Qwen3-VL vision transformer
  • Language Model: Qwen3-VL language model
  • Classification Head: Custom linear layers for 102 flower classes

The model takes both image and text inputs (questions about flowers) and outputs classification predictions.

Training Details

Training Configuration

  • Base Model: oscarqjh/Qwen3-VL-4B-Instruct-Flowers102-Open-QA
  • Checkpoint: Step 1,700 Training Steps: 1,700 / 2,000 steps
  • Total Epochs: 1.90 epochs
  • Batch Size: 1 per device
  • Gradient Accumulation Steps: 2
  • Effective Batch Size: 2
  • Initial Learning Rate: 9.00e-06
  • Final Learning Rate: 3.34e-05

Training Results

  • Final Training Loss: 3.6995
  • Training Runtime: N/A seconds
  • Training Speed: N/A samples/second

Model Architecture

  • Number of Classes: 102
  • Base Model Frozen: True
  • Dropout Rate: 0.1

Usage

from transformers import AutoProcessor
from src.models.qwen_classifier import Qwen3VLClassifier
import torch
from PIL import Image

# Load model and processor
model = Qwen3VLClassifier.from_pretrained("your-username/model-name")
processor = AutoProcessor.from_pretrained("oscarqjh/Qwen3-VL-4B-Instruct-Flowers102-Open-QA")

# Prepare inputs
image = Image.open("flower.jpg")
text = "What type of flower is this?"

# Create chat format
messages = [{
    'role': 'user',
    'content': [
        {'type': 'image', 'image': image},
        {'type': 'text', 'text': text}
    ]
}]

# Process inputs
text_input = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
inputs = processor(text=[text_input], images=[image], return_tensors="pt", padding=True)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predicted_class = outputs.logits.argmax(dim=-1).item()
    confidence = torch.softmax(outputs.logits, dim=-1).max().item()

print(f"Predicted class: {predicted_class}, Confidence: {confidence:.4f}")

Dataset

The model was trained on the Flowers102 dataset, which contains 102 flower categories with:

  • 7,169 training images
  • 1,020 validation images
  • 6,149 test images

Citation

If you use this model, please cite the original Qwen3-VL paper and the Flowers102 dataset.

Downloads last month
34
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oscarqjh/Qwen3-VL-4B-Instruct-SFT-Flowers102-Classifier

Collection including oscarqjh/Qwen3-VL-4B-Instruct-SFT-Flowers102-Classifier