Whisper Small - Dhivehi (Corrected Fine-tune)

Model: d3b4g/whisper-small-dv-corrected
Base model: openai/whisper-small
Language: Dhivehi (Thaana)
Dataset: Mozilla Common Voice 17.0 (dv)
Training Platform: RunPod RTX 4090
Training Epochs: 4.05
Best Checkpoint: 7000
Eval WER: 0.1107 (≈ 11%)
Eval Loss: 0.0055


Overview

This model is a fine-tuned version of OpenAI’s Whisper Small, trained specifically for Dhivehi (Thaana) speech recognition using the Mozilla Common Voice 17.0 dataset.
The goal of this project was to create an open, accurate Dhivehi ASR model suitable for transcription, voice assistants, and dataset generation.

During fine-tuning, the model achieved a Word Error Rate (WER) of 11.07%, showing strong performance on clean speech and conversational recordings.


Usage Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf

# Load processor and model
processor = WhisperProcessor.from_pretrained("d3b4g/whisper-small-dv-corrected")
model = WhisperForConditionalGeneration.from_pretrained("d3b4g/whisper-small-dv-corrected")

# Load audio file
audio, rate = sf.read("sample.wav")

# Preprocess and predict
input_features = processor(audio, sampling_rate=rate, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)

# Decode to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Downloads last month
36
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train d3b4g/whisper-small-dv-corrected

Evaluation results

  • Word Error Rate (WER) on Mozilla Common Voice 17.0 (Dhivehi)
    self-reported
    0.111