Automatic Speech Recognition
Transformers
Safetensors
Yiddish
whisper
yoad commited on
Commit
ee856dd
·
verified ·
1 Parent(s): f37cde4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ datasets:
5
+ - ivrit-ai/crowd-recital-yi-whisper-training
6
+ - ivrit-ai/crowd-whatsapp-yi-whisper-training
7
+ language:
8
+ - yi
9
+ metrics:
10
+ - wer
11
+ base_model:
12
+ - openai/whisper-large-v3-turbo
13
+ pipeline_tag: automatic-speech-recognition
14
+ ---
15
+
16
+ # Model Card for Model ID
17
+
18
+ This model is a Yiddish finetune (continued training) of the OpenAI Whisper Large v3 model.
19
+
20
+
21
+ ## Model Details
22
+
23
+ ### Model Description
24
+
25
+ - **Developed by:** ivrit-ai
26
+ - **Language(s) (NLP):** Yiddish
27
+ - **License:** Apache-2.0
28
+ - **Finetuned from model** openai/whisper-large-v3-turbo
29
+ - **Training Date** Oct 2025
30
+
31
+ ## Bias, Risks, and Limitations
32
+
33
+ Language detection capability of this model has been degraded during training - it is intended for mostly-hebrew audio transcription.
34
+ Language token should be explicitly set to Yiddish
35
+
36
+ Additionally, the translation task was not trained and also degraded. This model would not be able to translate in any reasonable capacity.
37
+
38
+ ## How to Get Started with the Model
39
+
40
+ Please follow the original [model card](https://huggingface.co/openai/whisper-large-v3#usage) for usage details - replacing with this model name.
41
+ You can also find other weight formats and quantizations on the [ivrit ai](https://huggingface.co/ivrit-ai) HF page.
42
+
43
+ We created some simple example scripts using this model and weights for other inference runtimes.
44
+ Find those in the ["examples"](https://github.com/ivrit-ai/asr-training/tree/master/examples) folder within the training GitHub repo.
45
+
46
+ ## Training Details
47
+
48
+ ### Training Data
49
+
50
+ This model was trained on the following datasets:
51
+
52
+ - [ivrit-ai/crowd-recital-yi-whisper-training](https://huggingface.co/datasets/ivrit-ai/crowd-recital-yi-whisper-training) - Crowd-sourced recording of Wikipedia/Michlol article snippets. ~78h
53
+ - [ivrit-ai/crowd-whatsapp-yi-whisper-training](https://huggingface.co/datasets/ivrit-ai/crowd-whatsapp-yi-whisper-training) - Crowd-sourced whatsapp based voice recording of predefined prompts - ~19h
54
+
55
+ ### Training Procedure
56
+
57
+ This model was trained in two main phases:
58
+ - Recital + Whatsapp based pre-training - over both datasets.
59
+ - Post training on the Whatsapp dataset only
60
+
61
+ Training code can be found on the ivrit-ai Github [here](https://github.com/ivrit-ai/asr-training)
62
+
63
+ #### Preprocessing
64
+
65
+ The "Crowd Recital" and "Whatsapp" datasets contain timestamps and previous text following the Whisper expected inputs.
66
+ Timestamps were used from 50% of samples from those datasets, and 50% of the previous text was used.
67
+
68
+ Preprocessing code can be found within the training code [repository](https://github.com/ivrit-ai/asr-training).
69
+
70
+ Datasets were interleaved with 0.915:0.085 ratio (recital:whatsapp) during the pretraining phase.
71
+
72
+ #### Training Hyperparameters
73
+
74
+ - **Training regime:** bf16 mixed precision with sdpa
75
+ - **Learning Rate:** 5E-6, Linear decay, 500 steps warmup for 4 epochs + additional 200 steps on Whatsapp only with LR of 1E-6
76
+ - **Batch Size:** 32
77
+
78
+ #### Training Hardware / Duration
79
+
80
+ - **GPU Type:** 8 x Nvidia A40 machine
81
+ - **Duration:** ~5h run across both phases
82
+
83
+ ## Evaluation
84
+
85
+ The Yi eval set is not yet published - an internal eval set was used.