DeBERTa-v3 for Explicit Hate Speech Detection

This is a DeBERTa-v3-base model fine-tuned for binary hate speech classification (HATE vs. NOT_HATE). It has been converted to ONNX format for high-performance inference.

This model was trained as part of a Master's project to identify the most robust hate speech classifier. While it performed exceptionally well on quantitative benchmarks, it has significant, well-documented limitations in detecting subtle or "coded" hate.

Author: [Taiwo Ogun]

Model Performance

The model was fine-tuned on the HateXplain dataset and evaluated against an out-of-domain (OOD) dataset (Davidson) to test for real-world generalization. It was the clear winner against four other models (BERT, DistilBERT, RoBERTa, XLM-RoBERTa).

F1-Score & Accuracy

model	F1 (In-Domain)	F1 (Out-of-Domain)	Accuracy (In-Domain)	Accuracy (Out-of-Domain)
DeBERTa-v3 (This Model)	83.19%	92.86%	78.78%	87.98%
BERT	83.48%	81.68%	79.70%	73.05%
DistilBERT	83.13%	80.01%	78.68%	71.22%
RoBERTa	82.37%	76.38%	78.78%	66.56%
XLM-RoBERTa	81.58%	66.69%	77.59%	56.36%

How to Use (with ONNX Runtime)

This model is in ONNX format. The easiest way to use it is with Hugging Face's optimum library.

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSequenceClassification, pipeline

# Load the ONNX model and tokenizer from the Hub
repo_id = "TaiwoOgun/deberta-v3-hate-speech-onnx"
model = ORTModelForSequenceClassification.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

# Create the pipeline
classifier = pipeline(
    "text-classification", 
    model=model, 
    tokenizer=tokenizer
)

# Run inference
texts = [
    "This is a wonderful, positive statement.",
    "Go back to where you came from."
]

predictions = classifier(texts)
print(predictions)
# [{'label': 'NOT_HATE', 'score': 0.8...}, 
#  {'label': 'NOT_HATE', 'score': 0.8...}]

Downloads last month: 232