Whisper Small - Dhivehi (Corrected Fine-tune)

Model: d3b4g/whisper-small-dv-corrected
Base model: openai/whisper-small
Language: Dhivehi (Thaana)
Dataset: Mozilla Common Voice 17.0 (dv)
Training Platform: RunPod RTX 4090
Training Epochs: 4.05
Best Checkpoint: 7000
Eval WER: 0.1107 (≈ 11%)
Eval Loss: 0.0055

Overview

This model is a fine-tuned version of OpenAI’s Whisper Small, trained specifically for Dhivehi (Thaana) speech recognition using the Mozilla Common Voice 17.0 dataset.
The goal of this project was to create an open, accurate Dhivehi ASR model suitable for transcription, voice assistants, and dataset generation.

During fine-tuning, the model achieved a Word Error Rate (WER) of 11.07%, showing strong performance on clean speech and conversational recordings.

Usage Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf

# Load processor and model
processor = WhisperProcessor.from_pretrained("d3b4g/whisper-small-dv-corrected")
model = WhisperForConditionalGeneration.from_pretrained("d3b4g/whisper-small-dv-corrected")

# Load audio file
audio, rate = sf.read("sample.wav")

# Preprocess and predict
input_features = processor(audio, sampling_rate=rate, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)

# Decode to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Downloads last month: 36

Safetensors

Model size

0.2B params

Tensor type

F32

Dataset used to train d3b4g/whisper-small-dv-corrected

Evaluation results

Word Error Rate (WER) on Mozilla Common Voice 17.0 (Dhivehi)
self-reported

0.111

View on Papers With Code