Digit & Blank Image Classifier (PyTorch CNN)
A high-accuracy convolutional neural network trained to classify handwritten digits from the MNIST and EMNIST Digits datasets, and additionally detect blank images (unfilled boxes) as a distinct class. This model is trained using PyTorch and exported in TorchScript format (.pt) for reliable and portable inference.
License & Attribution
This model is licensed under the AGPL-3.0 license to comply with the Plom Project licensing requirements.
Developed as part of the Plom Project
Authors & Credits:
- Model: Deep Shah, Undergraduate Research Assistant, UBC
- Supervision: Prof. Andrew Rechnitzer and Prof. Colin B. MacDonald
- Project: The Plom Project GitLab
Overview
- Input: 1Γ28Γ28 grayscale image
- Output: Integer class prediction:
- 0β9: Digits
- 10: Blank image
- Architecture: 3-layer CNN with BatchNorm, ReLU, MaxPooling, Dropout, Fully Connected Layers
- Model Format: TorchScript (
.pt), ONNX (.onnx) - Training Dataset: Combined MNIST, EMNIST Digits, and 5000 synthetic blank images
Dataset Details
Datasets Used:
- MNIST β 28Γ28 handwritten digits (0β9), 60,000 training images
- EMNIST Digits β 28Γ28 digits extracted from handwritten characters, 240,000+ training samples
- Blank Images β 5,000 synthetic all-black 28Γ28 images, labeled as class
10to simulate unfilled regions
Preprocessing:
- Normalized pixel values to [0, 1]
- Converted images to channel-first format (N, C, H, W)
- Combined and shuffled datasets
Data Augmentation
To improve generalization and robustness to handwriting variation:
RandomRotation(Β±10Β°)RandomAffine: scale (0.9β1.1), translate (Β±10%)
These transformations simulate handwritten noise and variation in real student submissions.
Model Architecture
Input: (1, 28, 28)
β Conv2D(1 β 32) + BatchNorm + ReLU
β Conv2D(32 β 64) + BatchNorm + ReLU
β MaxPool2d(2x2) + Dropout(0.1)
β Conv2D(64 β 128) + BatchNorm + ReLU
β MaxPool2d(2x2) + Dropout(0.1)
β Flatten
β Linear(128*7*7 β 128) + BatchNorm + ReLU + Dropout(0.2)
β Linear(128 β 11)
β Output: class logits (digits 0β9, blank = 10)
Training Configuration
| Hyperparameter | Value |
|---|---|
| Optimizer | Adam (lr=0.001) |
| Loss Function | CrossEntropyLoss |
| Scheduler | ReduceLROnPlateau |
| Early Stopping | Patience = 5 |
| Epochs | Max 50 |
| Batch Size | 64 |
| Device | CPU or CUDA |
| Random Seed | 42 |
Evaluation Results
| Metric | Value |
|---|---|
| Test Accuracy | 99.73% |
| Blank Image Accuracy | 100.00% |
All 5,000 blank images were correctly classified.
Inference Examples
1. TorchScript (PyTorch)
import torch
# Load TorchScript model
model = torch.jit.load("mnist_emnist_blank_cnn_v1.pt")
model.eval()
# Dummy input (1 image, 1 channel, 28x28)
img = torch.randn(1, 1, 28, 28)
# Predict
with torch.no_grad():
out = model(img)
predicted = out.argmax(dim=1).item()
print("Predicted class:", predicted)
2. ONNX (ONNX Runtime)
import onnxruntime as ort
import numpy as np
# Load ONNX model
session = ort.InferenceSession("mnist_emnist_blank_cnn_v1.onnx", providers=["CPUExecutionProvider"])
# Dummy input
img = np.random.randn(1, 1, 28, 28).astype(np.float32)
# Predict
outputs = session.run(None, {"input": img})
predicted = int(outputs[0].argmax(axis=1)[0])
print("Predicted class:", predicted)
If the prediction is
10, the model considers the image to be blank (no digits present).
Included Files
train_digit_classifier.py: Training script with full documentationmnist_emnist_blank_cnn_v1.pth: Final trained model weightsmnist_emnist_blank_cnn_v1.pt: TorchScript export for deploymentmnist_emnist_blank_cnn_v1.onnx: ONNX export for deploymentrequirements.txt: Required dependencies for training or inference
Intended Use
This model was designed to support the Plom Projectβs student ID digit detection system, helping automatically identify handwritten digits (and detect blank/unfilled boxes) from scanned exam sheets.
It may also be adapted for other handwritten digit classification tasks or real-time blank field detection applications.