🏷️ Urdu Turn Detection Model β€” INT8 Quantized Variant (DistilBERT)

This repository contains the quantized INT8 version of the PuristanLabs1/urdu-turn-detection-distilbert Detection model.
It classifies Urdu sentences into:

  • Complete β†’ Speaker has finished speaking
  • Incomplete β†’ Speaker intends to continue

πŸ” Why Turn Detection Is Not the Same as VAD

Voice Activity Detection (VAD) detects whether speech is present in the audio β€” it only answers:
➑️ β€œIs the speaker making sound or not?”

It cannot determine whether a sentence is logically complete, because it relies solely on acoustic features like energy, pitch, and silence gaps.

Turn Detection, on the other hand, is a linguistic task:
➑️ β€œIs this utterance syntactically and semantically complete?”

Turn detection requires understanding:

  • Grammar
  • Sentence structure
  • Pausing patterns
  • Incomplete clauses
  • Pragmatic cues in text

Thus:

  • VAD β†’ audio boundary detection
  • Turn Detection β†’ meaning boundary detection

This model is designed specifically for dialogue systems, ASR post-processing, and real-time voice agents, where understanding when the user has finished their thought is critical.


⚑ About This Quantized Model

This model is obtained by applying Dynamic INT8 Quantization on top of the finetuned PuristanLabs1/urdu-turn-detection-distilbert EoT classifier.
Quantization reduces model size, memory footprint, and CPU latency, while retaining high accuracy.


πŸ“‰ Quantization Summary

Property Value
Original Model Size ~516 MB
Quantized Model Size 393.11 MB
Original Latency (100 runs) 3.9496 seconds
Quantized Latency (100 runs) 1.4129 seconds
Speedup 2.8Γ— faster on CPU

πŸ“Š Quantized Model Performance (INT8)

Evaluation on 1,000 held-out Urdu samples:

Metric Score
Accuracy 95.90%
F1 Score 95.91%
Precision 96.78%
Recall 95.06%

πŸ”¬ Detailed Classification Report

Class Precision Recall F1-Score Support
Incomplete 0.95 0.97 0.96 494
Complete 0.97 0.95 0.96 506
Overall Accuracy 0.96 1000

πŸ†š Base vs Quantized Model

Variant Description Size Latency (CPU) F1 Score
Base Fine-tuned DistilBERT ~516 MB ~40 ms 97.2%
Quantized (INT8) Dynamic quantization ~393 MB ~14 ms 95.9%

πŸ”§ Usage β€” Fast Inference with Quantized Model


import torch
from transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification

model_name = "PuristanLabs1/urdu-turn-detection-distilbert-quantized"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load config
config = AutoConfig.from_pretrained(model_name)

# Build model architecture
model = AutoModelForSequenceClassification.from_config(config)

# Apply dynamic quantization
model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

# Load quantized weights
state_dict = torch.hub.load_state_dict_from_url(
    f"https://huggingface.co/{model_name}/resolve/main/quantized_model.pt",
    map_location="cpu",
)
model.load_state_dict(state_dict)

# Inference
text = "Ω…ΫŒΪΊ Ϊ―ΪΎΨ± Ψ¬Ψ§ رہا ہوں"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)
    score = probs[0][1].item()

label = "Complete" if score > 0.5 else "Incomplete"
print(label, score)

πŸ“š Dataset & Training

  • Total samples: 10,000 Urdu utterances
  • Balanced: 50% Complete, 50% Incomplete
  • Includes:
    • 2,825 real-world human-validated samples
    • 7,175 synthetic high-quality Urdu sentences
  • Script: Nastaliq/Arabic Urdu only (no Roman Urdu)

Training was identical to the base model; only inference layers were quantized.


🎯 Intended Use Cases

  • Voice assistants (detect user turn-end)
  • Dialogue agents / Conversational AI
  • Real-time ASR segmentation
  • Voice bots & call center automation
  • Turn-taking prediction in multi-speaker systems

⚠️ Limitations

  • Text-only model (no acoustic cues)
  • Ambiguous short utterances (e.g., β€œΫΨ§ΪΊΨŸβ€, β€œΪ©ΫŒΨ§ΨŸβ€) may remain context-dependent
  • Not a replacement for VAD; best used together with VAD in pipelines

πŸ“ License

MIT License

Downloads last month
50
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support