Model Card for trocr-base-handwritten_nj_biergarten_captcha_v2
This is a model for CAPTCHA OCR.
Model Details
Model Description
This is a simple model finetuned from microsoft/trocr-base-handwritten on a dataset
I created at phunc20/nj_biergarten_captcha_v2.
Uses
Direct Use
import torch
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
hub_dir = "phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2"
processor = TrOCRProcessor.from_pretrained(hub_dir)
model = VisionEncoderDecoderModel.from_pretrained(hub_dir)
model = model.to(device)
from PIL import Image
image = Image.open("/path/to/image")
pixel_values = processor(image, return_tensors='pt').pixel_values
pixel_values = pixel_values.to(device)
outputs = model.generate(pixel_values)
pred_str = processor.batch_decode(outputs, skip_special_tokens=True)[0]
Bias, Risks, and Limitations
Although the model seems to perform well on the dataset phunc20/nj_biergarten_captcha_v2,
it does not exhibit such good performance across all CAPTCHA images. In this respect, this
model is worse than Human.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
Like I mentioned, I trained this model on phunc20/nj_biergarten_captcha_v2.
In particular, I trained on the train split and evalaute on validation split,
without touching the test split.
Training Procedure
Please refer to https://gitlab.com/phunc20/captchew/-/blob/main/colab_notebooks/train_from_pretrained_Seq2SeqTrainer_torchDataset.ipynb?ref_type=heads which is adapted from https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TrOCR/Fine_tune_TrOCR_on_IAM_Handwriting_Database_using_Seq2SeqTrainer.ipynb
Evaluation
Testing Data, Factors & Metrics
Testing Data
- The
testsplit ofphunc20/nj_biergarten_captcha_v2 - This Kaggle dataset https://www.kaggle.com/datasets/fournierp/captcha-version-2-images/data
(we shall call this dataset by the name of
kaggle_test_setin this model card.)
Factors
[More Information Needed]
Metrics
CER, exact match and average length difference. The former two can be found in HuggingFace's documentation. The last one is just one metric I care a little about. It is quite easy to understand and, if need be, explanation could be found at the source code: https://gitlab.com/phunc20/captchew/-/blob/v0.1/average_length_difference.py
Results
On the test split of phunc20/nj_biergarten_captcha_v2
| Model | cer | exact match | avg len diff |
|---|---|---|---|
phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2 |
0.001333 | 496/500 | 1/500 |
microsoft/trocr-base-handwritten |
0.9 | 5/500 | 2.4 |
On kaggle_test_set
| Model | cer | exact match | avg len diff |
|---|---|---|---|
phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2 |
0.4381 | 69/1070 | 0.1289 |
microsoft/trocr-base-handwritten |
1.0112 | 17/1070 | 2.4439 |
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
- Downloads last month
- 27
Model tree for phunc20/trocr-base-handwritten_nj_biergarten_captcha_v2
Base model
microsoft/trocr-base-handwritten