xlm-roberta-large-text-entailment-88 - Multilingual Textual Entailment

Model Description

This model is a fine-tuned version of xlm-roberta-large for multilingual textual entailment (Natural Language Inference) on English and Korean text.

Task: Given a premise and a hypothesis, predict whether the hypothesis is:

Entailment (0): The hypothesis is necessarily true given the premise
Neutral (1): The hypothesis might be true given the premise
Contradiction (2): The hypothesis is necessarily false given the premise

Intended Uses & Limitations

Intended Uses

Textual entailment / Natural Language Inference tasks
English and Korean language pairs
Research and educational purposes
Building NLU applications

Limitations

Trained primarily on English (94%) and Korean (6%) data
May not generalize well to other languages
Performance may vary on out-of-domain text
Not suitable for tasks requiring deep reasoning or external knowledge

Training Data

The model was trained on a multilingual dataset containing:

English samples: ~382k
Korean samples: ~25k
Total samples: ~407k premise-hypothesis pairs
Label distribution: Balanced (33% each class)

Data preprocessing:

Case preservation (no lowercasing) for optimal performance with cased models
Unicode normalization (NFC) for Korean text
Special character cleanup
Duplicate removal

Training Procedure

Training Hyperparameters

Base model: xlm-roberta-large
Learning rate: 1e-05
Batch size: 32
Gradient accumulation: 4 steps
Effective batch size: 128
Epochs: 15
Max sequence length: 192
Label smoothing: 0.1
Weight decay: 0.01
Warmup ratio: 10%
Mixed precision: FP16
Early stopping patience: 4 epochs

Hardware

GPU: NVIDIA H100/RTX 4060 Ti
Training time: ~60-90 minutes

Evaluation Results

Overall Performance

Metric	Score
F1 Score (weighted)	0.8800
Accuracy	0.8800

Per-Class Performance

precision recall f1-score support

entailment 0.87 0.90 0.89 8606 neutral 0.85 0.83 0.84 8530 contradiction 0.91 0.90 0.90 8600

 accuracy                           0.88     25736
macro avg       0.88      0.88      0.88     25736

weighted avg 0.88 0.88 0.88 25736

Usage

Direct Usage (Transformers)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "bekalebendong/xlm-roberta-large-text-entailment-88"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example inference
premise = "Orsay is one of the few Paris museums that is air-conditioned."
hypothesis = "The Orsay museum has air conditioning."

# Tokenize
inputs = tokenizer(premise, hypothesis, return_tensors="pt", truncation=True, max_length=192)

# Predict
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.softmax(outputs.logits, dim=1)
    label = torch.argmax(predictions, dim=1).item()

# Map to label
label_map = {0: "entailment", 1: "neutral", 2: "contradiction"}
print(f"Prediction: {label_map[label]} (confidence: {predictions[0][label]:.4f})")

Pipeline Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="bekalebendong/xlm-roberta-large-text-entailment-88",
    tokenizer="bekalebendong/xlm-roberta-large-text-entailment-88"
)

result = classifier(
    "Orsay is one of the few Paris museums that is air-conditioned.",
    "The Orsay museum has air conditioning."
)
print(result)

Citation

If you use this model in your research, please cite:

@misc{xlm-roberta-text-entailment,
  author = {Your Name},
  title = {Multilingual Textual Entailment with XLM-RoBERTa},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/bekalebendong/xlm-roberta-large-text-entailment-88}}
}

Acknowledgments

Base model: xlm-roberta-large by FacebookAI
Training framework: PyTorch + Hugging Face Transformers
Optimization: Mixed precision training (FP16) with gradient accumulation

Model Card Authors

Your Name

Model Card Contact

For questions or issues, please open an issue on the model repository.

Downloads last month: 21

Safetensors

Model size

0.6B params

Tensor type

F32

Dataset used to train bekalebendong/xlm-roberta-large-text-entailment-88

Evaluation results

F1 Score
self-reported

0.880
Accuracy
self-reported

0.880

Metadata error: specify a dataset to view leaderboard