Spaces:

Daksh0505
/

Seq2Seq-LSTM-MultiHeadAttention-Translation

Running

App Files Files Community

Seq2Seq-LSTM-MultiHeadAttention-Translation / README.md

Daksh0505

Update README.md

f62b1e9 verified 23 days ago

preview code

raw

history blame contribute delete

3.01 kB

A newer version of the Streamlit SDK is available: 1.51.0

Upgrade

metadata

colorTo: indigo
colorFrom: indigo
emoji: 👁

English → Hindi Translation with Seq2Seq + Multi-Head Attention

This Streamlit Space demonstrates the power of LSTM with self-attention mechanisms for sequence-to-sequence (Seq2Seq) tasks. Specifically, it showcases multi-head cross-attention in a translation setting.

🚀 Purpose

This Space is designed to illustrate how LSTM-based Seq2Seq models combined with attention mechanisms can perform language translation. It is intended for educational and demonstration purposes, highlighting:

Encoder-Decoder architecture using LSTMs
Multi-head attention for better context understanding
Sequence-to-sequence translation from English to Hindi
Comparison between smaller (12M parameters) and larger (42M parameters) models

🧠 Models

Model	Parameters	Vocabulary	Training Data	Repository
Model A	12M	50k	20k rows	seq2seq-lstm-multiheadattention-12.3
Model B	42M	256k	100k rows	seq2seq-lstm-multiheadattention-42

Model A performs better on small datasets it was trained on.
Model B has higher capacity but requires more diverse data to generalize well.

📋 Features

Select a model size (12M or 42M parameters)
View model architecture layer-by-layer
Choose a sentence from the dataset to translate
Compare original vs predicted translation
Highlight how multi-head attention improves Seq2Seq performance

🛠 How it Works

Encoder:
- Processes the input English sentence
- Embedding → Layer Normalization → Dropout → BiLSTM → Hidden states
Decoder:
- Receives previous token embeddings and encoder states
- Applies multi-head cross-attention over encoder outputs
- Generates the next token until <end> token is reached
Prediction:
- Step-by-step decoding using trained weights
- Output Hindi sentence is reconstructed token by token

💻 Usage

Select the model size from the dropdown
Expand Show Model Architecture to see layer details
Select a sentence from the dataset
Click Translate to view predicted Hindi translation

⚠️ Notes

Model performance depends on training data size and domain
Smaller model (12M) generalizes better on smaller datasets
Larger model (42M) requires more data and fine-tuning for small datasets

📚 References

Seq2Seq with Attention: Bahdanau et al., 2014
Multi-Head Attention: Vaswani et al., 2017

👨‍💻 Author

Daksh Bhardwaj
Email: [email protected]
GitHub: Daksh5555