metadata
library_name: transformers
license: apache-2.0
base_model: google/vit-base-patch16-224
tags:
- image-classification
- cifar10
- computer-vision
- vision-transformer
- transfer-learning
metrics:
- accuracy
model-index:
- name: vit-base-cifar10-augmented
results:
- task:
type: image-classification
name: Image Classification
dataset:
name: CIFAR-10
type: cifar10
metrics:
- type: accuracy
value: 0.9554
vit-base-cifar10-augmented
This model is a fine-tuned version of google/vit-base-patch16-224 on the CIFAR-10 dataset using data augmentation.
It achieves the following results on the evaluation set:
- Loss: 0.0445
- Accuracy: 95.54%
🧠 Model Description
The base model is a Vision Transformer (ViT) originally trained on ImageNet-21k. This version has been fine-tuned on CIFAR-10, a standard image classification benchmark, using PyTorch and Hugging Face Transformers.
Training was done using extensive data augmentation, including random crops, flips, rotations, and color jitter to improve generalization on small input images (32×32, resized to 224×224).
✅ Intended Uses & Limitations
Intended uses
- Educational and research use on small image classification tasks
- Benchmarking transfer learning for ViT on CIFAR-10
- Demonstrating the impact of data augmentation on fine-tuning performance
Limitations
- Not optimized for real-time inference
- Fine-tuned only on CIFAR-10; not suitable for general-purpose image classification
- Requires resized input (224×224)
📦 Training and Evaluation Data
- Dataset: CIFAR-10
- Size: 60,000 images (10 classes)
- Split: 75% training, 25% test
All images were resized to 224×224 and normalized using ViT’s original mean/std values.
⚙️ Training Procedure
Hyperparameters
- Learning rate:
1e-4 - Optimizer:
Adam - Batch size:
8 - Epochs:
10 - Scheduler:
ReduceLROnPlateau
Data Augmentation Used
RandomResizedCrop(224)RandomHorizontalFlip()RandomRotation(10)ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)
Training Results
| Epoch | Training Loss | Test Accuracy |
|---|---|---|
| 1 | 0.1969 | 94.62% |
| 2 | 0.1189 | 95.05% |
| 3 | 0.0899 | 95.54% |
| 4 | 0.0720 | 94.68% |
| 5 | 0.0650 | 94.84% |
| 6 | 0.0576 | 94.76% |
| 7 | 0.0560 | 95.33% |
| 8 | 0.0488 | 94.31% |
| 9 | 0.0499 | 95.42% |
| 10 | 0.0445 | 94.33% |
🧪 Framework Versions
transformers: 4.50.0torch: 2.6.0+cu124datasets: 3.4.1tokenizers: 0.21.1