π·οΈ Urdu Turn Detection Model β INT8 Quantized Variant (DistilBERT)
This repository contains the quantized INT8 version of the PuristanLabs1/urdu-turn-detection-distilbert Detection model.
It classifies Urdu sentences into:
- Complete β Speaker has finished speaking
- Incomplete β Speaker intends to continue
π Why Turn Detection Is Not the Same as VAD
Voice Activity Detection (VAD) detects whether speech is present in the audio β it only answers:
β‘οΈ βIs the speaker making sound or not?β
It cannot determine whether a sentence is logically complete, because it relies solely on acoustic features like energy, pitch, and silence gaps.
Turn Detection, on the other hand, is a linguistic task:
β‘οΈ βIs this utterance syntactically and semantically complete?β
Turn detection requires understanding:
- Grammar
- Sentence structure
- Pausing patterns
- Incomplete clauses
- Pragmatic cues in text
Thus:
- VAD β audio boundary detection
- Turn Detection β meaning boundary detection
This model is designed specifically for dialogue systems, ASR post-processing, and real-time voice agents, where understanding when the user has finished their thought is critical.
β‘ About This Quantized Model
This model is obtained by applying Dynamic INT8 Quantization on top of the finetuned PuristanLabs1/urdu-turn-detection-distilbert EoT classifier.
Quantization reduces model size, memory footprint, and CPU latency, while retaining high accuracy.
π Quantization Summary
| Property | Value |
|---|---|
| Original Model Size | ~516 MB |
| Quantized Model Size | 393.11 MB |
| Original Latency (100 runs) | 3.9496 seconds |
| Quantized Latency (100 runs) | 1.4129 seconds |
| Speedup | 2.8Γ faster on CPU |
π Quantized Model Performance (INT8)
Evaluation on 1,000 held-out Urdu samples:
| Metric | Score |
|---|---|
| Accuracy | 95.90% |
| F1 Score | 95.91% |
| Precision | 96.78% |
| Recall | 95.06% |
π¬ Detailed Classification Report
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Incomplete | 0.95 | 0.97 | 0.96 | 494 |
| Complete | 0.97 | 0.95 | 0.96 | 506 |
| Overall Accuracy | 0.96 | 1000 |
π Base vs Quantized Model
| Variant | Description | Size | Latency (CPU) | F1 Score |
|---|---|---|---|---|
| Base | Fine-tuned DistilBERT | ~516 MB | ~40 ms | 97.2% |
| Quantized (INT8) | Dynamic quantization | ~393 MB | ~14 ms | 95.9% |
π§ Usage β Fast Inference with Quantized Model
import torch
from transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification
model_name = "PuristanLabs1/urdu-turn-detection-distilbert-quantized"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Load config
config = AutoConfig.from_pretrained(model_name)
# Build model architecture
model = AutoModelForSequenceClassification.from_config(config)
# Apply dynamic quantization
model = torch.quantization.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.qint8
)
# Load quantized weights
state_dict = torch.hub.load_state_dict_from_url(
f"https://huggingface.co/{model_name}/resolve/main/quantized_model.pt",
map_location="cpu",
)
model.load_state_dict(state_dict)
# Inference
text = "Ω
ΫΪΊ Ϊ―ΪΎΨ± Ψ¬Ψ§ Ψ±ΫΨ§ ΫΩΪΊ"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
score = probs[0][1].item()
label = "Complete" if score > 0.5 else "Incomplete"
print(label, score)
π Dataset & Training
- Total samples: 10,000 Urdu utterances
- Balanced: 50% Complete, 50% Incomplete
- Includes:
- 2,825 real-world human-validated samples
- 7,175 synthetic high-quality Urdu sentences
- Script: Nastaliq/Arabic Urdu only (no Roman Urdu)
Training was identical to the base model; only inference layers were quantized.
π― Intended Use Cases
- Voice assistants (detect user turn-end)
- Dialogue agents / Conversational AI
- Real-time ASR segmentation
- Voice bots & call center automation
- Turn-taking prediction in multi-speaker systems
β οΈ Limitations
- Text-only model (no acoustic cues)
- Ambiguous short utterances (e.g., βΫΨ§ΪΊΨβ, βΪ©ΫΨ§Ψβ) may remain context-dependent
- Not a replacement for VAD; best used together with VAD in pipelines
π License
MIT License
- Downloads last month
- 50