--- license: mit language: - de metrics: - cer library_name: transformers tags: - trocr - kurrent - ocr - htr - 19th century base_model: - microsoft/trocr-base-handwritten pipeline_tag: image-text-to-text --- # TrOCR Kurrent-Model 19th century Handwritten Text Recognition model for 19th century German. Part of the developments at the [Digital Humanities@University of Bern](https://www.dh.unibe.ch/). Developed by Jonas Widmer and Tobias Hodel in conjunction with researchers and institutions mentioned below. Base model: **microsoft/trocr-base-handwritten** Train Lines: 292'997 Eval Lines: 7'513 Test Lines: 15'817 Epochs: 19.66 / 20 Eval CER: 0.02827 Test CER: 0.02655 Finetuned on Kurrent-dataset, containing: - Material from the State Archives of Zurich ("Regierungsratsprotokolle"), provided by the State Archives of Zurich - Lecture notes of Humboldt Lectures, provided by the Berlin-Brandenburgian Academy of Sciences - Diary of Eugen Huber, provided by the University of Zurich - Handwriting and Copies by and of Gottfried Semper (provided by the respective research project at ETH Zürich and USI Mendrisio) - Konzilsprotokolle, University of Greifswald (19th century) - as well as many other smaller collections/examples The model has not been extensively tested. Potential biases are still to be identified.