π Hybrid Food Image Classifier (CNN + ViT)
This model combines ResNet50 (CNN) and DeiT-Base (ViT) with an adaptive fusion module for state-of-the-art food image classification.
Model Architecture
- CNN Branch: ResNet50 (pretrained on ImageNet)
- ViT Branch: DeiT-Base Distilled (pretrained)
- Fusion Module: Adaptive attention-based fusion with multi-head cross-attention
- Classes: 101 food categories from Food-101 dataset
Performance
- Validation Accuracy: ~82.5%
- Top-5 Accuracy: >95%
Files
best_model.pth: Trained PyTorch checkpointreal_class_mapping.json: Human-readable class namesconfig.yaml: Training configurationfood101_class_names.json: Original class names
Quick Usage
from huggingface_hub import hf_hub_download
import torch
# Download model
ckpt_path = hf_hub_download(
repo_id="codealchemist01/food-image-classifier-hybrid",
filename="best_model.pth"
)
# Load checkpoint
checkpoint = torch.load(ckpt_path, map_location="cpu")
Demo
Try the live demo: Food Classifier Space
Training Details
- Dataset: Food-101 (101,000 images across 101 categories)
- Framework: PyTorch 2.0+
- Image Size: 224x224
- Optimizer: AdamW with cosine annealing warm restarts
- Augmentations: Albumentations (flip, rotation, color jitter)
- Mixed Precision: FP16 training
Citation
@misc{food-classifier-hybrid,
author = {codealchemist01},
title = {Hybrid Food Image Classifier},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/codealchemist01/food-image-classifier-hybrid}}
}
- Downloads last month
- 2