YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Parallel-T5-Translation-PyTorch

Project Title and Introduction

Parallel-T5-Translation-PyTorch is a custom, optimized Transformer-based sequence-to-sequence model inspired by T5-Small, developed for English-to-French Machine Translation.

The core innovation in this project is the Parallel Multi-Head Attention mechanism, designed to enable experimentation with model parallelism and improve attention efficiency. This implementation provides a foundation for studying how attention heads can be executed concurrently to enhance performance in translation tasks.

Custom Model Architecture: Parallel Attention

Overview

Our model, ParallelT5Small, replaces the standard Multi-Head Attention (MHA) with a novel Parallel Multi-Head Attention (P-MHA) layer.

Standard MHA:
Computes one set of Query (Q), Key (K), and Value (V) projections, then splits the resulting vectors across all heads.
Parallel MHA (Proposed):
Splits the attention mechanism into two parallel streams, each using separate Q/K/V projection weights for half of the attention heads.
The results from both parallel streams are independently projected back to the hidden dimension and then summed to form the final attention output.

Goal

This architecture serves as a foundation for:

Exploring architectural variants of the Transformer.
Studying the effects of parallelized attention on translation performance.
Investigating scalability in distributed training and efficiency on specialized hardware (e.g., GPUs or TPUs).

Model Architecture

Parallel T5 Architecture

Training & Evaluation Metrics (Epoch 37)

Metric	Train Result (Epoch 37)	Validation Result (Epoch 37)	Goal
Loss (Cross-Entropy)	4.2213	4.8907	Decrease loss below 2.0
Token Accuracy	≈ 18.18%	≈ 15.20%	Achieve 60%+
BLEU Score	To be implemented	To be implemented	Target: 30–40

Installation and Setup

Installation

To set up the project locally, follow these steps.
Python 3.8+ is required.

1Clone the Repository

git clone https://github.com/YourUsername/Parallel-T5-Translation-PyTorch.git
cd Parallel-T5-Translation-PyTorch

conda create -n parallel-t5 python=3.9
conda activate parallel-t5

# Install PyTorch (use the appropriate CUDA version for your setup)
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118

# Install project dependencies
pip install -r requirements.txt

Training and Preprocessing

The workflow consists of two main steps: data preparation and model training.

Step 1: Data Preprocessing

This step performs the following:

Downloads the GlobalVoices EN-FR dataset
Tokenizes data using the T5 tokenizer
Splits into train, validation, and test sets
Saves processed tensors to ./data/processed

Run the preprocessing:

python run.py

References

This project is built upon the foundational work of the T5 model and utilizes the publicly available GlobalVoices dataset.

🔹 T5 (Text-to-Text Transfer Transformer)

The model architecture is heavily inspired by the T5 framework, which casts all NLP problems into a text-to-text format.

Paper:
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (2019).

Link: https://arxiv.org/abs/1910.10683

🔹 OPUS - GlobalVoices Dataset

The parallel English-French data used for training is sourced from the OPUS collection’s GlobalVoices corpus.

Resource:
Jörg Tiedemann. The OPUS Parallel Corpus (2012).

Link (GlobalVoices source):
https://object.pouta.csc.fi/OPUS-GlobalVoices/v2018q4/moses/en-fr.txt.zip

🔹 Hugging Face Transformers Library

The AutoTokenizer and several best practices for transformer training and dataset handling are derived from the Hugging Face ecosystem.

Library:
https://huggingface.co/docs/transformers/index

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support