🏷️ Urdu Turn Detection Model — INT8 Quantized Variant (DistilBERT)

This repository contains the quantized INT8 version of the PuristanLabs1/urdu-turn-detection-distilbert Detection model.
It classifies Urdu sentences into:

Complete → Speaker has finished speaking
Incomplete → Speaker intends to continue

🔍 Why Turn Detection Is Not the Same as VAD

Voice Activity Detection (VAD) detects whether speech is present in the audio — it only answers:
➡️ “Is the speaker making sound or not?”

It cannot determine whether a sentence is logically complete, because it relies solely on acoustic features like energy, pitch, and silence gaps.

Turn Detection, on the other hand, is a linguistic task:
➡️ “Is this utterance syntactically and semantically complete?”

Turn detection requires understanding:

Grammar
Sentence structure
Pausing patterns
Incomplete clauses
Pragmatic cues in text

Thus:

VAD → audio boundary detection
Turn Detection → meaning boundary detection

This model is designed specifically for dialogue systems, ASR post-processing, and real-time voice agents, where understanding when the user has finished their thought is critical.

⚡ About This Quantized Model

This model is obtained by applying Dynamic INT8 Quantization on top of the finetuned PuristanLabs1/urdu-turn-detection-distilbert EoT classifier.
Quantization reduces model size, memory footprint, and CPU latency, while retaining high accuracy.

📉 Quantization Summary

Property	Value
Original Model Size	~516 MB
Quantized Model Size	393.11 MB
Original Latency (100 runs)	3.9496 seconds
Quantized Latency (100 runs)	1.4129 seconds
Speedup	2.8× faster on CPU

📊 Quantized Model Performance (INT8)

Evaluation on 1,000 held-out Urdu samples:

Metric	Score
Accuracy	95.90%
F1 Score	95.91%
Precision	96.78%
Recall	95.06%

🔬 Detailed Classification Report

Class	Precision	Recall	F1-Score	Support
Incomplete	0.95	0.97	0.96	494
Complete	0.97	0.95	0.96	506
Overall Accuracy			0.96	1000

🆚 Base vs Quantized Model

Variant	Description	Size	Latency (CPU)	F1 Score
Base	Fine-tuned DistilBERT	~516 MB	~40 ms	97.2%
Quantized (INT8)	Dynamic quantization	~393 MB	~14 ms	95.9%

🔧 Usage — Fast Inference with Quantized Model


import torch
from transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification

model_name = "PuristanLabs1/urdu-turn-detection-distilbert-quantized"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load config
config = AutoConfig.from_pretrained(model_name)

# Build model architecture
model = AutoModelForSequenceClassification.from_config(config)

# Apply dynamic quantization
model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

# Load quantized weights
state_dict = torch.hub.load_state_dict_from_url(
    f"https://huggingface.co/{model_name}/resolve/main/quantized_model.pt",
    map_location="cpu",
)
model.load_state_dict(state_dict)

# Inference
text = "میں گھر جا رہا ہوں"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)
    score = probs[0][1].item()

label = "Complete" if score > 0.5 else "Incomplete"
print(label, score)

📚 Dataset & Training

Total samples: 10,000 Urdu utterances
Balanced: 50% Complete, 50% Incomplete
Includes:
- 2,825 real-world human-validated samples
- 7,175 synthetic high-quality Urdu sentences
Script: Nastaliq/Arabic Urdu only (no Roman Urdu)

Training was identical to the base model; only inference layers were quantized.

🎯 Intended Use Cases

Voice assistants (detect user turn-end)
Dialogue agents / Conversational AI
Real-time ASR segmentation
Voice bots & call center automation
Turn-taking prediction in multi-speaker systems

⚠️ Limitations

Text-only model (no acoustic cues)
Ambiguous short utterances (e.g., “ہاں؟”, “کیا؟”) may remain context-dependent
Not a replacement for VAD; best used together with VAD in pipelines

📝 License

MIT License

Downloads last month: 9