Automatic Speech Recognition
Transformers
Safetensors
Yiddish
whisper
File size: 3,158 Bytes
ee856dd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
library_name: transformers
license: apache-2.0
datasets:
- ivrit-ai/crowd-recital-yi-whisper-training
- ivrit-ai/crowd-whatsapp-yi-whisper-training
language:
- yi
metrics:
- wer
base_model:
- openai/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
---

# Model Card for Model ID

This model is a Yiddish finetune (continued training) of the OpenAI Whisper Large v3 model.


## Model Details

### Model Description

- **Developed by:** ivrit-ai
- **Language(s) (NLP):** Yiddish
- **License:** Apache-2.0
- **Finetuned from model** openai/whisper-large-v3-turbo
- **Training Date** Oct 2025

## Bias, Risks, and Limitations

Language detection capability of this model has been degraded during training - it is intended for mostly-hebrew audio transcription.
Language token should be explicitly set to Yiddish

Additionally, the translation task was not trained and also degraded. This model would not be able to translate in any reasonable capacity.

## How to Get Started with the Model

Please follow the original [model card](https://huggingface.co/openai/whisper-large-v3#usage) for usage details - replacing with this model name.
You can also find other weight formats and quantizations on the [ivrit ai](https://huggingface.co/ivrit-ai) HF page.

We created some simple example scripts using this model and weights for other inference runtimes.
Find those in the ["examples"](https://github.com/ivrit-ai/asr-training/tree/master/examples) folder within the training GitHub repo.

## Training Details

### Training Data

This model was trained on the following datasets:

- [ivrit-ai/crowd-recital-yi-whisper-training](https://huggingface.co/datasets/ivrit-ai/crowd-recital-yi-whisper-training) - Crowd-sourced recording of Wikipedia/Michlol article snippets. ~78h
- [ivrit-ai/crowd-whatsapp-yi-whisper-training](https://huggingface.co/datasets/ivrit-ai/crowd-whatsapp-yi-whisper-training) - Crowd-sourced whatsapp based voice recording of predefined prompts - ~19h

### Training Procedure

This model was trained in two main phases:
- Recital + Whatsapp based pre-training - over both datasets.
- Post training on the Whatsapp dataset only

Training code can be found on the ivrit-ai Github [here](https://github.com/ivrit-ai/asr-training)

#### Preprocessing

The "Crowd Recital" and "Whatsapp" datasets contain timestamps and previous text following the Whisper expected inputs.
Timestamps were used from 50% of samples from those datasets, and 50% of the previous text was used.

Preprocessing code can be found within the training code [repository](https://github.com/ivrit-ai/asr-training).

Datasets were interleaved with 0.915:0.085 ratio (recital:whatsapp) during the pretraining phase.

#### Training Hyperparameters

- **Training regime:** bf16 mixed precision with sdpa
- **Learning Rate:** 5E-6, Linear decay, 500 steps warmup for 4 epochs + additional 200 steps on Whatsapp only with LR of 1E-6
- **Batch Size:** 32

#### Training Hardware / Duration

- **GPU Type:** 8 x Nvidia A40 machine
- **Duration:** ~5h run across both phases

## Evaluation

The Yi eval set is not yet published - an internal eval set was used.