|  | --- | 
					
						
						|  | library_name: transformers | 
					
						
						|  | license: apache-2.0 | 
					
						
						|  | base_model: openai/whisper-large-v3 | 
					
						
						|  | tags: | 
					
						
						|  | - automatic-speech-recognition | 
					
						
						|  | - whisper | 
					
						
						|  | - urdu | 
					
						
						|  | - mozilla-foundation/common_voice_17_0 | 
					
						
						|  | - hf-asr-leaderboard | 
					
						
						|  | datasets: | 
					
						
						|  | - mozilla-foundation/common_voice_17_0 | 
					
						
						|  | metrics: | 
					
						
						|  | - wer | 
					
						
						|  | - cer | 
					
						
						|  | - bleu | 
					
						
						|  | - chrf | 
					
						
						|  | model-index: | 
					
						
						|  | - name: whisper-large-v3-urdu | 
					
						
						|  | results: | 
					
						
						|  | - task: | 
					
						
						|  | type: automatic-speech-recognition | 
					
						
						|  | name: Automatic Speech Recognition | 
					
						
						|  | dataset: | 
					
						
						|  | name: Common Voice 17.0 (Urdu) | 
					
						
						|  | type: mozilla-foundation/common_voice_17_0 | 
					
						
						|  | config: ur | 
					
						
						|  | split: test | 
					
						
						|  | args: ur | 
					
						
						|  | metrics: | 
					
						
						|  | - type: wer | 
					
						
						|  | value: 26.019 | 
					
						
						|  | name: WER | 
					
						
						|  | - type: cer | 
					
						
						|  | value: 9.426 | 
					
						
						|  | name: CER | 
					
						
						|  | - type: bleu | 
					
						
						|  | value: 59.446 | 
					
						
						|  | name: BLEU | 
					
						
						|  | - type: chrf | 
					
						
						|  | value: 82.902 | 
					
						
						|  | name: ChrF | 
					
						
						|  | language: | 
					
						
						|  | - ur | 
					
						
						|  | pipeline_tag: automatic-speech-recognition | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | <!-- This model card has been generated automatically according to the information the Trainer had access to. You | 
					
						
						|  | should probably proofread and complete it, then remove this comment. --> | 
					
						
						|  |  | 
					
						
						|  | # Whisper large V3 Urdu ASR Model 🥇 | 
					
						
						|  |  | 
					
						
						|  | This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the common_voice_17_0 dataset. | 
					
						
						|  | It achieves the following results on the evaluation set: | 
					
						
						|  | - Loss: 0.0204 | 
					
						
						|  | - Wer: 21.4712 | 
					
						
						|  | - Cer: 7.1975 | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Quick Usage | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from transformers import pipeline | 
					
						
						|  |  | 
					
						
						|  | transcriber = pipeline( | 
					
						
						|  | "automatic-speech-recognition", | 
					
						
						|  | model="kingabzpro/whisper-large-v3-turbo-urdu" | 
					
						
						|  | ) | 
					
						
						|  |  | 
					
						
						|  | transcriber.model.generation_config.forced_decoder_ids = None | 
					
						
						|  | transcriber.model.generation_config.language = "ur" | 
					
						
						|  |  | 
					
						
						|  | transcription = transcriber("audio2.mp3") | 
					
						
						|  | print(transcription) | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ```sh | 
					
						
						|  | {'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'} | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ### Training hyperparameters | 
					
						
						|  |  | 
					
						
						|  | The following hyperparameters were used during training: | 
					
						
						|  | - learning_rate: 3e-05 | 
					
						
						|  | - train_batch_size: 8 | 
					
						
						|  | - eval_batch_size: 4 | 
					
						
						|  | - seed: 42 | 
					
						
						|  | - gradient_accumulation_steps: 2 | 
					
						
						|  | - total_train_batch_size: 16 | 
					
						
						|  | - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments | 
					
						
						|  | - lr_scheduler_type: cosine | 
					
						
						|  | - lr_scheduler_warmup_steps: 100 | 
					
						
						|  | - training_steps: 1500 | 
					
						
						|  |  | 
					
						
						|  | ### Training results | 
					
						
						|  |  | 
					
						
						|  | | Training Loss | Epoch  | Step | Validation Loss | Wer     | Cer     | | 
					
						
						|  | |:-------------:|:------:|:----:|:---------------:|:-------:|:-------:| | 
					
						
						|  | | 0.0261        | 0.5089 | 300  | 0.0254          | 30.0224 | 10.3646 | | 
					
						
						|  | | 0.0211        | 1.0170 | 600  | 0.0226          | 25.8588 | 8.5780  | | 
					
						
						|  | | 0.0121        | 1.5259 | 900  | 0.0206          | 24.2158 | 7.9412  | | 
					
						
						|  | | 0.0093        | 2.0339 | 1200 | 0.0195          | 21.3032 | 7.2018  | | 
					
						
						|  | | 0.0043        | 2.5428 | 1500 | 0.0204          | 21.4712 | 7.1975  | | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ### Framework versions | 
					
						
						|  |  | 
					
						
						|  | - Transformers 4.52.2 | 
					
						
						|  | - Pytorch 2.7.1+cu126 | 
					
						
						|  | - Datasets 3.4.1 | 
					
						
						|  | - Tokenizers 0.21.2 | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## Evaluation | 
					
						
						|  |  | 
					
						
						|  | Urdu ASR Evaluation on Common Voice 17.0 (Test Split). | 
					
						
						|  |  | 
					
						
						|  | | Metric | Value    | Description                        | | 
					
						
						|  | |--------|----------|------------------------------------| | 
					
						
						|  | | **WER**   | 26.019%  | Word Error Rate (lower is better) | | 
					
						
						|  | | **CER**   | 9.426%   | Character Error Rate              | | 
					
						
						|  | | **BLEU**  | 59.446% | BLEU Score (higher is better)     | | 
					
						
						|  | | **ChrF**  | 82.902 | Character n-gram F-score          | | 
					
						
						|  |  | 
					
						
						|  | >👉 Review the testing script: [Testing Whisper Large V3 Urdu](https://www.kaggle.com/code/kingabzpro/testing-urdu-whisper-large-v3) | 
					
						
						|  |  | 
					
						
						|  |  |