A newer version of the Streamlit SDK is available:
1.51.0
metadata
colorTo: indigo
colorFrom: indigo
emoji: π
English β Hindi Translation with Seq2Seq + Multi-Head Attention
This Streamlit Space demonstrates the power of LSTM with self-attention mechanisms for sequence-to-sequence (Seq2Seq) tasks. Specifically, it showcases multi-head cross-attention in a translation setting.
π Purpose
This Space is designed to illustrate how LSTM-based Seq2Seq models combined with attention mechanisms can perform language translation. It is intended for educational and demonstration purposes, highlighting:
- Encoder-Decoder architecture using LSTMs
- Multi-head attention for better context understanding
- Sequence-to-sequence translation from English to Hindi
- Comparison between smaller (12M parameters) and larger (42M parameters) models
π§ Models
| Model | Parameters | Vocabulary | Training Data | Repository |
|---|---|---|---|---|
| Model A | 12M | 50k | 20k rows | seq2seq-lstm-multiheadattention-12.3 |
| Model B | 42M | 256k | 100k rows | seq2seq-lstm-multiheadattention-42 |
- Model A performs better on small datasets it was trained on.
- Model B has higher capacity but requires more diverse data to generalize well.
π Features
- Select a model size (12M or 42M parameters)
- View model architecture layer-by-layer
- Choose a sentence from the dataset to translate
- Compare original vs predicted translation
- Highlight how multi-head attention improves Seq2Seq performance
π How it Works
Encoder:
- Processes the input English sentence
- Embedding β Layer Normalization β Dropout β BiLSTM β Hidden states
Decoder:
- Receives previous token embeddings and encoder states
- Applies multi-head cross-attention over encoder outputs
- Generates the next token until
<end>token is reached
Prediction:
- Step-by-step decoding using trained weights
- Output Hindi sentence is reconstructed token by token
π» Usage
- Select the model size from the dropdown
- Expand Show Model Architecture to see layer details
- Select a sentence from the dataset
- Click Translate to view predicted Hindi translation
β οΈ Notes
- Model performance depends on training data size and domain
- Smaller model (12M) generalizes better on smaller datasets
- Larger model (42M) requires more data and fine-tuning for small datasets
π References
- Seq2Seq with Attention: Bahdanau et al., 2014
- Multi-Head Attention: Vaswani et al., 2017
π¨βπ» Author
Daksh Bhardwaj
Email: [email protected]
GitHub: Daksh5555