Esperanto -> Catalan, English, Spanish MT Model

Model description

This repository contains a multilingual MarianMT model for Esperanto โ†’ (English, Spanish, Catalan) translation using language tags.

Usage

The model is loaded and used with transformers as:

from transformers import MarianMTModel, MarianTokenizer
import torch

model_name = "Helsinki-NLP/opus-mt-eo-caenes"

device = "cuda" if torch.cuda.is_available() else "cpu"
model = MarianMTModel.from_pretrained(model_name).to(device)
tokenizer = MarianTokenizer.from_pretrained(model_name)

source_texts = [
    ">>spa<< Saluton, kiel vi fartas?",
    ">>eng<< Saluton, kiel vi fartas?",
    ">>cat<< Saluton, kiel vi fartas?"
]

inputs = tokenizer(source_texts, return_tensors="pt", padding=True, truncation=True)
inputs = {k: v.to(device) for k, v in inputs.items()}

translated_ids = model.generate(inputs["input_ids"])
translated_texts = tokenizer.batch_decode(translated_ids, skip_special_tokens=True)

for src, tgt in zip(source_texts, translated_texts):
    print(f"Source: {src} => Translated: {tgt}")

Supported target languages (via tags)

You control the target language by prefixing the source sentence with one of the following tags:

  • >>eng<< โ†’ English
  • >>spa<< โ†’ Spanish
  • >>cat<< โ†’ Catalan

Training data

The model was trained using Tatoeba parallel data, with FLORES-200 used as the development set.

Training sentence-pair counts:

  • ca-eo: 672,931
  • es-eo: 4,677,945
  • eo-en: 5,000,000

Evaluation on FLORES

Language Pair BLEU ChrF++
epo-spa 19.98 49.11
epo-cat 28.35 55.42
epo-eng 37.47 63.09
Downloads last month
64
Safetensors
Model size
76.9M params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support