--- colorTo: indigo colorFrom: indigo emoji: 👁 --- # English → Hindi Translation with Seq2Seq + Multi-Head Attention This Streamlit Space demonstrates the **power of LSTM with self-attention mechanisms** for sequence-to-sequence (Seq2Seq) tasks. Specifically, it showcases **multi-head cross-attention** in a translation setting. --- ## 🚀 Purpose This Space is designed to **illustrate how LSTM-based Seq2Seq models combined with attention mechanisms** can perform language translation. It is intended for educational and demonstration purposes, highlighting: - Encoder-Decoder architecture using LSTMs - Multi-head attention for better context understanding - Sequence-to-sequence translation from English to Hindi - Comparison between **smaller (12M parameters)** and **larger (42M parameters)** models --- ## 🧠 Models | Model | Parameters | Vocabulary | Training Data | Repository | |-------|------------|-----------|---------------|------------| | Model A | 12M | 50k | 20k rows | [seq2seq-lstm-multiheadattention-12.3](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) | | Model B | 42M | 256k | 100k rows | [seq2seq-lstm-multiheadattention-42](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) | - **Model A** performs better on small datasets it was trained on. - **Model B** has higher capacity but requires more diverse data to generalize well. --- ## 📋 Features - Select a model size (12M or 42M parameters) - View **model architecture** layer-by-layer - Choose a sentence from the dataset to translate - Compare **original vs predicted translation** - Highlight how multi-head attention improves Seq2Seq performance --- ## 🛠 How it Works 1. **Encoder**: - Processes the input English sentence - Embedding → Layer Normalization → Dropout → BiLSTM → Hidden states 2. **Decoder**: - Receives previous token embeddings and encoder states - Applies multi-head cross-attention over encoder outputs - Generates the next token until `` token is reached 3. **Prediction**: - Step-by-step decoding using trained weights - Output Hindi sentence is reconstructed token by token --- ## 💻 Usage 1. Select the model size from the dropdown 2. Expand **Show Model Architecture** to see layer details 3. Select a sentence from the dataset 4. Click **Translate** to view predicted Hindi translation --- ## ⚠️ Notes - Model performance depends on **training data size and domain** - Smaller model (12M) generalizes better on smaller datasets - Larger model (42M) requires **more data** and **fine-tuning** for small datasets --- ## 📚 References - **Seq2Seq with Attention**: [Bahdanau et al., 2014](https://arxiv.org/abs/1409.0473) - **Multi-Head Attention**: [Vaswani et al., 2017](https://arxiv.org/abs/1706.03762) --- ## 👨‍💻 Author Daksh Bhardwaj Email: dakshbhardwaj0505@gmail.com GitHub: [Daksh5555](https://github.com/daksh5555)