ivrit-ai
/

yi-whisper-large-v3-turbo

+---
+library_name: transformers
+license: apache-2.0
+datasets:
+- ivrit-ai/crowd-recital-yi-whisper-training
+- ivrit-ai/crowd-whatsapp-yi-whisper-training
+language:
+- yi
+metrics:
+- wer
+base_model:
+- openai/whisper-large-v3-turbo
+pipeline_tag: automatic-speech-recognition
+---
+# Model Card for Model ID
+This model is a Yiddish finetune (continued training) of the OpenAI Whisper Large v3 model.
+## Model Details
+### Model Description
+- **Developed by:** ivrit-ai
+- **Language(s) (NLP):** Yiddish
+- **License:** Apache-2.0
+- **Finetuned from model** openai/whisper-large-v3-turbo
+- **Training Date** Oct 2025
+## Bias, Risks, and Limitations
+Language detection capability of this model has been degraded during training - it is intended for mostly-hebrew audio transcription.
+Language token should be explicitly set to Yiddish
+Additionally, the translation task was not trained and also degraded. This model would not be able to translate in any reasonable capacity.
+## How to Get Started with the Model
+Please follow the original [model card](https://huggingface.co/openai/whisper-large-v3#usage) for usage details - replacing with this model name.
+You can also find other weight formats and quantizations on the [ivrit ai](https://huggingface.co/ivrit-ai) HF page.
+We created some simple example scripts using this model and weights for other inference runtimes.
+Find those in the ["examples"](https://github.com/ivrit-ai/asr-training/tree/master/examples) folder within the training GitHub repo.
+## Training Details
+### Training Data
+This model was trained on the following datasets:
+- [ivrit-ai/crowd-recital-yi-whisper-training](https://huggingface.co/datasets/ivrit-ai/crowd-recital-yi-whisper-training) - Crowd-sourced recording of Wikipedia/Michlol article snippets. ~78h
+- [ivrit-ai/crowd-whatsapp-yi-whisper-training](https://huggingface.co/datasets/ivrit-ai/crowd-whatsapp-yi-whisper-training) - Crowd-sourced whatsapp based voice recording of predefined prompts - ~19h
+### Training Procedure
+This model was trained in two main phases:
+- Recital + Whatsapp based pre-training - over both datasets.
+- Post training on the Whatsapp dataset only
+Training code can be found on the ivrit-ai Github [here](https://github.com/ivrit-ai/asr-training)
+#### Preprocessing
+The "Crowd Recital" and "Whatsapp" datasets contain timestamps and previous text following the Whisper expected inputs.
+Timestamps were used from 50% of samples from those datasets, and 50% of the previous text was used.
+Preprocessing code can be found within the training code [repository](https://github.com/ivrit-ai/asr-training).
+Datasets were interleaved with 0.915:0.085 ratio (recital:whatsapp) during the pretraining phase.
+#### Training Hyperparameters
+- **Training regime:** bf16 mixed precision with sdpa
+- **Learning Rate:** 5E-6, Linear decay, 500 steps warmup for 4 epochs + additional 200 steps on Whatsapp only with LR of 1E-6
+- **Batch Size:** 32
+#### Training Hardware / Duration
+- **GPU Type:** 8 x Nvidia A40 machine
+- **Duration:** ~5h run across both phases
+## Evaluation
+The Yi eval set is not yet published - an internal eval set was used.