Whisper Small - Dhivehi (Corrected Fine-tune)
Model: d3b4g/whisper-small-dv-corrected
Base model: openai/whisper-small
Language: Dhivehi (Thaana)
Dataset: Mozilla Common Voice 17.0 (dv)
Training Platform: RunPod RTX 4090
Training Epochs: 4.05
Best Checkpoint: 7000
Eval WER: 0.1107 (≈ 11%)
Eval Loss: 0.0055
Overview
This model is a fine-tuned version of OpenAI’s Whisper Small, trained specifically for Dhivehi (Thaana) speech recognition using the Mozilla Common Voice 17.0 dataset.
The goal of this project was to create an open, accurate Dhivehi ASR model suitable for transcription, voice assistants, and dataset generation.
During fine-tuning, the model achieved a Word Error Rate (WER) of 11.07%, showing strong performance on clean speech and conversational recordings.
Usage Example
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf
# Load processor and model
processor = WhisperProcessor.from_pretrained("d3b4g/whisper-small-dv-corrected")
model = WhisperForConditionalGeneration.from_pretrained("d3b4g/whisper-small-dv-corrected")
# Load audio file
audio, rate = sf.read("sample.wav")
# Preprocess and predict
input_features = processor(audio, sampling_rate=rate, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
# Decode to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
- Downloads last month
- 36
Dataset used to train d3b4g/whisper-small-dv-corrected
Evaluation results
- Word Error Rate (WER) on Mozilla Common Voice 17.0 (Dhivehi)self-reported0.111