Esperanto -> Catalan, English, Spanish MT Model
Model description
This repository contains a multilingual MarianMT model for Esperanto โ (English, Spanish, Catalan) translation using language tags.
Usage
The model is loaded and used with transformers as:
from transformers import MarianMTModel, MarianTokenizer
import torch
model_name = "Helsinki-NLP/opus-mt-eo-caenes"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = MarianMTModel.from_pretrained(model_name).to(device)
tokenizer = MarianTokenizer.from_pretrained(model_name)
source_texts = [
">>spa<< Saluton, kiel vi fartas?",
">>eng<< Saluton, kiel vi fartas?",
">>cat<< Saluton, kiel vi fartas?"
]
inputs = tokenizer(source_texts, return_tensors="pt", padding=True, truncation=True)
inputs = {k: v.to(device) for k, v in inputs.items()}
translated_ids = model.generate(inputs["input_ids"])
translated_texts = tokenizer.batch_decode(translated_ids, skip_special_tokens=True)
for src, tgt in zip(source_texts, translated_texts):
print(f"Source: {src} => Translated: {tgt}")
Supported target languages (via tags)
You control the target language by prefixing the source sentence with one of the following tags:
>>eng<<โ English>>spa<<โ Spanish>>cat<<โ Catalan
Training data
The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.
Training sentence-pair counts:
- ca-eo: 672,931
- es-eo: 4,677,945
- eo-en: 5,000,000
Evaluation on FLORES
| Language Pair | BLEU | ChrF++ |
|---|---|---|
| epo-spa | 19.98 | 49.11 |
| epo-cat | 28.35 | 55.42 |
| epo-eng | 37.47 | 63.09 |
- Downloads last month
- 64