detorcla
/

cifar10-resnet18

Image Classification

computer-vision

vision-transformer

transfer-learning

Model card Files Files and versions

detorcla commited on May 25

Commit

06a2cf1

·

verified ·

1 Parent(s): 5ebdc45

Create README.md

Files changed (1) hide show

README.md +97 -0

README.md ADDED Viewed

	@@ -0,0 +1,97 @@

+---
+library_name: transformers
+license: apache-2.0
+base_model: google/vit-base-patch16-224
+tags:
+- image-classification
+- cifar10
+- computer-vision
+- vision-transformer
+- transfer-learning
+metrics:
+- accuracy
+model-index:
+- name: vit-base-cifar10-augmented
+  results:
+  - task:
+      type: image-classification
+      name: Image Classification
+    dataset:
+      name: CIFAR-10
+      type: cifar10
+    metrics:
+      - type: accuracy
+        value: 0.9554
+---
+# vit-base-cifar10-augmented
+This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) using data augmentation.
+It achieves the following results on the evaluation set:
+- **Loss:** 0.0445
+- **Accuracy:** 95.54%
+## 🧠 Model Description
+The base model is a Vision Transformer (ViT) originally trained on ImageNet-21k. This version has been fine-tuned on CIFAR-10, a standard image classification benchmark, using PyTorch and Hugging Face Transformers.
+Training was done using extensive **data augmentation**, including random crops, flips, rotations, and color jitter to improve generalization on small input images (32×32, resized to 224×224).
+## ✅ Intended Uses & Limitations
+### Intended uses
+- Educational and research use on small image classification tasks
+- Benchmarking transfer learning for ViT on CIFAR-10
+- Demonstrating the impact of data augmentation on fine-tuning performance
+### Limitations
+- Not optimized for real-time inference
+- Fine-tuned only on CIFAR-10; not suitable for general-purpose image classification
+- Requires resized input (224×224)
+## 📦 Training and Evaluation Data
+- **Dataset**: [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)
+- **Size**: 60,000 images (10 classes)
+- **Split**: 75% training, 25% test
+All images were resized to 224×224 and normalized using ViT’s original mean/std values.
+## ⚙️ Training Procedure
+### Hyperparameters
+- Learning rate: `1e-4`
+- Optimizer: `Adam`
+- Batch size: `8`
+- Epochs: `10`
+- Scheduler: `ReduceLROnPlateau`
+### Data Augmentation Used
+- `RandomResizedCrop(224)`
+- `RandomHorizontalFlip()`
+- `RandomRotation(10)`
+- `ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)`
+### Training Results
+| Epoch | Training Loss | Test Accuracy |
+|-------|---------------|---------------|
+| 1     | 0.1969        | 94.62%        |
+| 2     | 0.1189        | 95.05%        |
+| 3     | 0.0899        | **95.54%**    |
+| 4     | 0.0720        | 94.68%        |
+| 5     | 0.0650        | 94.84%        |
+| 6     | 0.0576        | 94.76%        |
+| 7     | 0.0560        | 95.33%        |
+| 8     | 0.0488        | 94.31%        |
+| 9     | 0.0499        | 95.42%        |
+| 10    | 0.0445        | 94.33%        |
+## 🧪 Framework Versions
+- `transformers`: 4.50.0
+- `torch`: 2.6.0+cu124
+- `datasets`: 3.4.1
+- `tokenizers`: 0.21.1