Add the PyLaia model trained on Himanis (#1)
Browse files- Add the PyLaia model trained on Himanis (f24bab3501a5f059d9ae75e7ee85788b528b6400)
- README.md +38 -0
- language_model.arpa.gz +3 -0
- lexicon.txt +130 -0
- model +0 -0
- syms.txt +130 -0
- tokens.txt +130 -0
- weights.ckpt +3 -0
README.md
CHANGED
|
@@ -1,3 +1,41 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
library_name: PyLaia
|
| 3 |
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- PyLaia
|
| 6 |
+
- PyTorch
|
| 7 |
+
- Handwritten text recognition
|
| 8 |
+
metrics:
|
| 9 |
+
- CER
|
| 10 |
+
- WER
|
| 11 |
+
language:
|
| 12 |
+
- 'fr'
|
| 13 |
---
|
| 14 |
+
|
| 15 |
+
# Himanis handwritten text recognition
|
| 16 |
+
|
| 17 |
+
This model performs Handwritten Text Recognition in French on medieval documents.
|
| 18 |
+
|
| 19 |
+
## Model description
|
| 20 |
+
|
| 21 |
+
The model was trained using the PyLaia library on two medieval datasets:
|
| 22 |
+
* [Himanis](https://demo.arkindex.org/browse/5000e248-a624-4df1-8679-1b34679817ef?top_level=true&folder=true) (French)
|
| 23 |
+
* [HOME Alcar](https://demo.arkindex.org/browse/46b9b1f4-baeb-4342-a501-e2f15472a276?top_level=true&folder=true) (Latin)
|
| 24 |
+
|
| 25 |
+
For training, text-lines were resized with a fixed height of 128 pixels, keeping the original aspect ratio.
|
| 26 |
+
|
| 27 |
+
An external 6-gram character language model can be used to improve recognition. The language model is trained on the text from the Himanis training set.
|
| 28 |
+
|
| 29 |
+
## Evaluation results
|
| 30 |
+
|
| 31 |
+
The model achieves the following results:
|
| 32 |
+
|
| 33 |
+
| set | Language model | CER (%) | WER (%) | N lines |
|
| 34 |
+
|:------|:---------------|:----------:|:-------:|----------:|
|
| 35 |
+
| test | no | 9.87 | 29.25 | 2241 |
|
| 36 |
+
| test | yes | 8.87 | 24.37 | 2241 |
|
| 37 |
+
|
| 38 |
+
## How to use
|
| 39 |
+
|
| 40 |
+
Please refer to the [documentation](https://atr.pages.teklia.com/pylaia/).
|
| 41 |
+
|
language_model.arpa.gz
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1b13e7e6745d8d8288edb133ba606857c0ad7f2f126abed599ffd2cadc2b285a
|
| 3 |
+
size 13091444
|
lexicon.txt
ADDED
|
@@ -0,0 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<ctc> <ctc>
|
| 2 |
+
! !
|
| 3 |
+
& &
|
| 4 |
+
# #
|
| 5 |
+
' '
|
| 6 |
+
( (
|
| 7 |
+
) )
|
| 8 |
+
* *
|
| 9 |
+
+ +
|
| 10 |
+
, ,
|
| 11 |
+
- -
|
| 12 |
+
. .
|
| 13 |
+
/ /
|
| 14 |
+
0 0
|
| 15 |
+
1 1
|
| 16 |
+
2 2
|
| 17 |
+
3 3
|
| 18 |
+
4 4
|
| 19 |
+
5 5
|
| 20 |
+
6 6
|
| 21 |
+
7 7
|
| 22 |
+
8 8
|
| 23 |
+
9 9
|
| 24 |
+
: :
|
| 25 |
+
; ;
|
| 26 |
+
= =
|
| 27 |
+
? ?
|
| 28 |
+
A A
|
| 29 |
+
B B
|
| 30 |
+
C C
|
| 31 |
+
D D
|
| 32 |
+
E E
|
| 33 |
+
F F
|
| 34 |
+
G G
|
| 35 |
+
H H
|
| 36 |
+
I I
|
| 37 |
+
J J
|
| 38 |
+
K K
|
| 39 |
+
L L
|
| 40 |
+
M M
|
| 41 |
+
N N
|
| 42 |
+
O O
|
| 43 |
+
P P
|
| 44 |
+
Q Q
|
| 45 |
+
R R
|
| 46 |
+
S S
|
| 47 |
+
T T
|
| 48 |
+
U U
|
| 49 |
+
V V
|
| 50 |
+
W W
|
| 51 |
+
X X
|
| 52 |
+
Y Y
|
| 53 |
+
Z Z
|
| 54 |
+
[ [
|
| 55 |
+
] ]
|
| 56 |
+
a a
|
| 57 |
+
b b
|
| 58 |
+
c c
|
| 59 |
+
d d
|
| 60 |
+
e e
|
| 61 |
+
f f
|
| 62 |
+
g g
|
| 63 |
+
h h
|
| 64 |
+
i i
|
| 65 |
+
j j
|
| 66 |
+
k k
|
| 67 |
+
l l
|
| 68 |
+
m m
|
| 69 |
+
n n
|
| 70 |
+
o o
|
| 71 |
+
p p
|
| 72 |
+
q q
|
| 73 |
+
r r
|
| 74 |
+
s s
|
| 75 |
+
t t
|
| 76 |
+
u u
|
| 77 |
+
v v
|
| 78 |
+
w w
|
| 79 |
+
x x
|
| 80 |
+
y y
|
| 81 |
+
z z
|
| 82 |
+
| |
|
| 83 |
+
~ ~
|
| 84 |
+
|
| 85 |
+
© ©
|
| 86 |
+
§ §
|
| 87 |
+
ª ª
|
| 88 |
+
« «
|
| 89 |
+
¬ ¬
|
| 90 |
+
¯ ¯
|
| 91 |
+
° °
|
| 92 |
+
¶ ¶
|
| 93 |
+
º º
|
| 94 |
+
» »
|
| 95 |
+
¿ ¿
|
| 96 |
+
À À
|
| 97 |
+
 Â
|
| 98 |
+
à Ã
|
| 99 |
+
Ç Ç
|
| 100 |
+
É É
|
| 101 |
+
Ï Ï
|
| 102 |
+
Ü Ü
|
| 103 |
+
à à
|
| 104 |
+
á á
|
| 105 |
+
â â
|
| 106 |
+
æ æ
|
| 107 |
+
ç ç
|
| 108 |
+
è è
|
| 109 |
+
é é
|
| 110 |
+
ë ë
|
| 111 |
+
ì ì
|
| 112 |
+
í í
|
| 113 |
+
î î
|
| 114 |
+
ï ï
|
| 115 |
+
ñ ñ
|
| 116 |
+
ú ú
|
| 117 |
+
ù ù
|
| 118 |
+
û û
|
| 119 |
+
ÿ ÿ
|
| 120 |
+
ę ę
|
| 121 |
+
ō ō
|
| 122 |
+
œ œ
|
| 123 |
+
ȩ ȩ
|
| 124 |
+
— —
|
| 125 |
+
‘ ‘
|
| 126 |
+
’ ’
|
| 127 |
+
… …
|
| 128 |
+
† †
|
| 129 |
+
<unk> <unk>
|
| 130 |
+
<space> <space>
|
model
ADDED
|
Binary file (1.52 kB). View file
|
|
|
syms.txt
ADDED
|
@@ -0,0 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<ctc> 0
|
| 2 |
+
! 1
|
| 3 |
+
& 2
|
| 4 |
+
# 3
|
| 5 |
+
' 4
|
| 6 |
+
( 5
|
| 7 |
+
) 6
|
| 8 |
+
* 7
|
| 9 |
+
+ 8
|
| 10 |
+
, 9
|
| 11 |
+
- 10
|
| 12 |
+
. 11
|
| 13 |
+
/ 12
|
| 14 |
+
0 13
|
| 15 |
+
1 14
|
| 16 |
+
2 15
|
| 17 |
+
3 16
|
| 18 |
+
4 17
|
| 19 |
+
5 18
|
| 20 |
+
6 19
|
| 21 |
+
7 20
|
| 22 |
+
8 21
|
| 23 |
+
9 22
|
| 24 |
+
: 23
|
| 25 |
+
; 24
|
| 26 |
+
= 25
|
| 27 |
+
? 26
|
| 28 |
+
A 27
|
| 29 |
+
B 28
|
| 30 |
+
C 29
|
| 31 |
+
D 30
|
| 32 |
+
E 31
|
| 33 |
+
F 32
|
| 34 |
+
G 33
|
| 35 |
+
H 34
|
| 36 |
+
I 35
|
| 37 |
+
J 36
|
| 38 |
+
K 37
|
| 39 |
+
L 38
|
| 40 |
+
M 39
|
| 41 |
+
N 40
|
| 42 |
+
O 41
|
| 43 |
+
P 42
|
| 44 |
+
Q 43
|
| 45 |
+
R 44
|
| 46 |
+
S 45
|
| 47 |
+
T 46
|
| 48 |
+
U 47
|
| 49 |
+
V 48
|
| 50 |
+
W 49
|
| 51 |
+
X 50
|
| 52 |
+
Y 51
|
| 53 |
+
Z 52
|
| 54 |
+
[ 53
|
| 55 |
+
] 54
|
| 56 |
+
a 55
|
| 57 |
+
b 56
|
| 58 |
+
c 57
|
| 59 |
+
d 58
|
| 60 |
+
e 59
|
| 61 |
+
f 60
|
| 62 |
+
g 61
|
| 63 |
+
h 62
|
| 64 |
+
i 63
|
| 65 |
+
j 64
|
| 66 |
+
k 65
|
| 67 |
+
l 66
|
| 68 |
+
m 67
|
| 69 |
+
n 68
|
| 70 |
+
o 69
|
| 71 |
+
p 70
|
| 72 |
+
q 71
|
| 73 |
+
r 72
|
| 74 |
+
s 73
|
| 75 |
+
t 74
|
| 76 |
+
u 75
|
| 77 |
+
v 76
|
| 78 |
+
w 77
|
| 79 |
+
x 78
|
| 80 |
+
y 79
|
| 81 |
+
z 80
|
| 82 |
+
| 81
|
| 83 |
+
~ 82
|
| 84 |
+
83
|
| 85 |
+
© 84
|
| 86 |
+
§ 85
|
| 87 |
+
ª 86
|
| 88 |
+
« 87
|
| 89 |
+
¬ 88
|
| 90 |
+
¯ 89
|
| 91 |
+
° 90
|
| 92 |
+
¶ 91
|
| 93 |
+
º 92
|
| 94 |
+
» 93
|
| 95 |
+
¿ 94
|
| 96 |
+
À 95
|
| 97 |
+
 96
|
| 98 |
+
à 97
|
| 99 |
+
Ç 98
|
| 100 |
+
É 99
|
| 101 |
+
Ï 100
|
| 102 |
+
Ü 101
|
| 103 |
+
à 102
|
| 104 |
+
á 103
|
| 105 |
+
â 104
|
| 106 |
+
æ 105
|
| 107 |
+
ç 106
|
| 108 |
+
è 107
|
| 109 |
+
é 108
|
| 110 |
+
ë 109
|
| 111 |
+
ì 110
|
| 112 |
+
í 111
|
| 113 |
+
î 112
|
| 114 |
+
ï 113
|
| 115 |
+
ñ 114
|
| 116 |
+
ú 115
|
| 117 |
+
ù 116
|
| 118 |
+
û 117
|
| 119 |
+
ÿ 118
|
| 120 |
+
ę 119
|
| 121 |
+
ō 120
|
| 122 |
+
œ 121
|
| 123 |
+
ȩ 122
|
| 124 |
+
— 123
|
| 125 |
+
‘ 124
|
| 126 |
+
’ 125
|
| 127 |
+
… 126
|
| 128 |
+
† 127
|
| 129 |
+
<unk> 128
|
| 130 |
+
<space> 129
|
tokens.txt
ADDED
|
@@ -0,0 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<ctc>
|
| 2 |
+
!
|
| 3 |
+
&
|
| 4 |
+
#
|
| 5 |
+
'
|
| 6 |
+
(
|
| 7 |
+
)
|
| 8 |
+
*
|
| 9 |
+
+
|
| 10 |
+
,
|
| 11 |
+
-
|
| 12 |
+
.
|
| 13 |
+
/
|
| 14 |
+
0
|
| 15 |
+
1
|
| 16 |
+
2
|
| 17 |
+
3
|
| 18 |
+
4
|
| 19 |
+
5
|
| 20 |
+
6
|
| 21 |
+
7
|
| 22 |
+
8
|
| 23 |
+
9
|
| 24 |
+
:
|
| 25 |
+
;
|
| 26 |
+
=
|
| 27 |
+
?
|
| 28 |
+
A
|
| 29 |
+
B
|
| 30 |
+
C
|
| 31 |
+
D
|
| 32 |
+
E
|
| 33 |
+
F
|
| 34 |
+
G
|
| 35 |
+
H
|
| 36 |
+
I
|
| 37 |
+
J
|
| 38 |
+
K
|
| 39 |
+
L
|
| 40 |
+
M
|
| 41 |
+
N
|
| 42 |
+
O
|
| 43 |
+
P
|
| 44 |
+
Q
|
| 45 |
+
R
|
| 46 |
+
S
|
| 47 |
+
T
|
| 48 |
+
U
|
| 49 |
+
V
|
| 50 |
+
W
|
| 51 |
+
X
|
| 52 |
+
Y
|
| 53 |
+
Z
|
| 54 |
+
[
|
| 55 |
+
]
|
| 56 |
+
a
|
| 57 |
+
b
|
| 58 |
+
c
|
| 59 |
+
d
|
| 60 |
+
e
|
| 61 |
+
f
|
| 62 |
+
g
|
| 63 |
+
h
|
| 64 |
+
i
|
| 65 |
+
j
|
| 66 |
+
k
|
| 67 |
+
l
|
| 68 |
+
m
|
| 69 |
+
n
|
| 70 |
+
o
|
| 71 |
+
p
|
| 72 |
+
q
|
| 73 |
+
r
|
| 74 |
+
s
|
| 75 |
+
t
|
| 76 |
+
u
|
| 77 |
+
v
|
| 78 |
+
w
|
| 79 |
+
x
|
| 80 |
+
y
|
| 81 |
+
z
|
| 82 |
+
|
|
| 83 |
+
~
|
| 84 |
+
|
| 85 |
+
©
|
| 86 |
+
§
|
| 87 |
+
ª
|
| 88 |
+
«
|
| 89 |
+
¬
|
| 90 |
+
¯
|
| 91 |
+
°
|
| 92 |
+
¶
|
| 93 |
+
º
|
| 94 |
+
»
|
| 95 |
+
¿
|
| 96 |
+
À
|
| 97 |
+
Â
|
| 98 |
+
Ã
|
| 99 |
+
Ç
|
| 100 |
+
É
|
| 101 |
+
Ï
|
| 102 |
+
Ü
|
| 103 |
+
à
|
| 104 |
+
á
|
| 105 |
+
â
|
| 106 |
+
æ
|
| 107 |
+
ç
|
| 108 |
+
è
|
| 109 |
+
é
|
| 110 |
+
ë
|
| 111 |
+
ì
|
| 112 |
+
í
|
| 113 |
+
î
|
| 114 |
+
ï
|
| 115 |
+
ñ
|
| 116 |
+
ú
|
| 117 |
+
ù
|
| 118 |
+
û
|
| 119 |
+
ÿ
|
| 120 |
+
ę
|
| 121 |
+
ō
|
| 122 |
+
œ
|
| 123 |
+
ȩ
|
| 124 |
+
—
|
| 125 |
+
‘
|
| 126 |
+
’
|
| 127 |
+
…
|
| 128 |
+
†
|
| 129 |
+
<unk>
|
| 130 |
+
<space>
|
weights.ckpt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d11c5c5b6b01a8a45ab48f5433e6f86a5266b8c55fc82c68ac05cd3fe2f9c2a7
|
| 3 |
+
size 42863420
|