XLM-RoBERTa Safety Classifier (Italian & English)

Model Description

This is an XLM-RoBERTa-based binary text classification model fine-tuned to detect toxicity and insults in user queries. It is trained on a bilingual dataset (Italian and English) to distinguish between SAFE (benign) and UNSAFE (toxic/harmful) inputs.

Model Type: XLM-RoBERTa (Fine-tuned)
Languages: Italian (it), English (en)
Task: Binary Classification
Training Dataset Size: 9,035 samples
Created by: Famezz

Intended Use

This model is designed to act as a guardrail for Chatbots and LLMs. It can be used to:

Filter out toxic user inputs before they reach a Large Language Model.
Flag offensive content in user-generated text.

Label Mapping

The model is trained to predict the following string labels directly:

Label	Description
SAFE	Benign queries, general knowledge, small talk.
UNSAFE	Toxic content, insults, offensive language.

Usage

You can use this model directly with the Hugging Face pipeline. The pipeline will automatically output the labels "SAFE" or "UNSAFE".

from transformers import pipeline

# Load the classifier
classifier = pipeline("text-classification", model="Famezz/roberta_safety_classifier")

# Test with English
print(classifier("How do I bake a cake?"))
# Output: [{'label': 'SAFE', 'score': 0.99}]

# Test with Italian
print(classifier("Sei un idiota"))
# Output: [{'label': 'UNSAFE', 'score': 0.98}]

Downloads last month: 15

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Famezz/roberta_safety_classifier

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3705)

this model