Parallel-T5-Translation-PyTorch
Project Title and Introduction
Parallel-T5-Translation-PyTorch is a custom, optimized Transformer-based sequence-to-sequence model inspired by T5-Small, developed for English-to-French Machine Translation.
The core innovation in this project is the Parallel Multi-Head Attention mechanism, designed to enable experimentation with model parallelism and improve attention efficiency. This implementation provides a foundation for studying how attention heads can be executed concurrently to enhance performance in translation tasks.
Custom Model Architecture: Parallel Attention
Overview
Our model, ParallelT5Small, replaces the standard Multi-Head Attention (MHA) with a novel Parallel Multi-Head Attention (P-MHA) layer.
Standard MHA:
Computes one set of Query (Q), Key (K), and Value (V) projections, then splits the resulting vectors across all heads.Parallel MHA (Proposed):
Splits the attention mechanism into two parallel streams, each using separate Q/K/V projection weights for half of the attention heads.
The results from both parallel streams are independently projected back to the hidden dimension and then summed to form the final attention output.
Goal
This architecture serves as a foundation for:
- Exploring architectural variants of the Transformer.
- Studying the effects of parallelized attention on translation performance.
- Investigating scalability in distributed training and efficiency on specialized hardware (e.g., GPUs or TPUs).
Model Architecture
Training & Evaluation Metrics (Epoch 37)
| Metric | Train Result (Epoch 37) | Validation Result (Epoch 37) | Goal |
|---|---|---|---|
| Loss (Cross-Entropy) | 4.2213 | 4.8907 | Decrease loss below 2.0 |
| Token Accuracy | โ 18.18% | โ 15.20% | Achieve 60%+ |
| BLEU Score | To be implemented | To be implemented | Target: 30โ40 |
Installation and Setup
Installation
To set up the project locally, follow these steps.
Python 3.8+ is required.
1Clone the Repository
git clone https://github.com/YourUsername/Parallel-T5-Translation-PyTorch.git
cd Parallel-T5-Translation-PyTorch
conda create -n parallel-t5 python=3.9
conda activate parallel-t5
# Install PyTorch (use the appropriate CUDA version for your setup)
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
# Install project dependencies
pip install -r requirements.txt
Training and Preprocessing
The workflow consists of two main steps: data preparation and model training.
Step 1: Data Preprocessing
This step performs the following:
- Downloads the GlobalVoices EN-FR dataset
- Tokenizes data using the T5 tokenizer
- Splits into train, validation, and test sets
- Saves processed tensors to
./data/processed
Run the preprocessing:
python run.py
References
This project is built upon the foundational work of the T5 model and utilizes the publicly available GlobalVoices dataset.
๐น T5 (Text-to-Text Transfer Transformer)
The model architecture is heavily inspired by the T5 framework, which casts all NLP problems into a text-to-text format.
Paper:
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (2019).
Link: https://arxiv.org/abs/1910.10683
๐น OPUS - GlobalVoices Dataset
The parallel English-French data used for training is sourced from the OPUS collectionโs GlobalVoices corpus.
Resource:
Jรถrg Tiedemann. The OPUS Parallel Corpus (2012).
Link (GlobalVoices source):
https://object.pouta.csc.fi/OPUS-GlobalVoices/v2018q4/moses/en-fr.txt.zip
๐น Hugging Face Transformers Library
The AutoTokenizer and several best practices for transformer training and dataset handling are derived from the Hugging Face ecosystem.
Library:
https://huggingface.co/docs/transformers/index