File size: 3,011 Bytes
234f08f ea5515e f62b1e9 234f08f 0e9d162 d40f59b 0e9d162 d40f59b 0e9d162 234f08f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
---
colorTo: indigo
colorFrom: indigo
emoji: π
---
# English β Hindi Translation with Seq2Seq + Multi-Head Attention
This Streamlit Space demonstrates the **power of LSTM with self-attention mechanisms** for sequence-to-sequence (Seq2Seq) tasks. Specifically, it showcases **multi-head cross-attention** in a translation setting.
---
## π Purpose
This Space is designed to **illustrate how LSTM-based Seq2Seq models combined with attention mechanisms** can perform language translation. It is intended for educational and demonstration purposes, highlighting:
- Encoder-Decoder architecture using LSTMs
- Multi-head attention for better context understanding
- Sequence-to-sequence translation from English to Hindi
- Comparison between **smaller (12M parameters)** and **larger (42M parameters)** models
---
## π§ Models
| Model | Parameters | Vocabulary | Training Data | Repository |
|-------|------------|-----------|---------------|------------|
| Model A | 12M | 50k | 20k rows | [seq2seq-lstm-multiheadattention-12.3](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) |
| Model B | 42M | 256k | 100k rows | [seq2seq-lstm-multiheadattention-42](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) |
- **Model A** performs better on small datasets it was trained on.
- **Model B** has higher capacity but requires more diverse data to generalize well.
---
## π Features
- Select a model size (12M or 42M parameters)
- View **model architecture** layer-by-layer
- Choose a sentence from the dataset to translate
- Compare **original vs predicted translation**
- Highlight how multi-head attention improves Seq2Seq performance
---
## π How it Works
1. **Encoder**:
- Processes the input English sentence
- Embedding β Layer Normalization β Dropout β BiLSTM β Hidden states
2. **Decoder**:
- Receives previous token embeddings and encoder states
- Applies multi-head cross-attention over encoder outputs
- Generates the next token until `<end>` token is reached
3. **Prediction**:
- Step-by-step decoding using trained weights
- Output Hindi sentence is reconstructed token by token
---
## π» Usage
1. Select the model size from the dropdown
2. Expand **Show Model Architecture** to see layer details
3. Select a sentence from the dataset
4. Click **Translate** to view predicted Hindi translation
---
## β οΈ Notes
- Model performance depends on **training data size and domain**
- Smaller model (12M) generalizes better on smaller datasets
- Larger model (42M) requires **more data** and **fine-tuning** for small datasets
---
## π References
- **Seq2Seq with Attention**: [Bahdanau et al., 2014](https://arxiv.org/abs/1409.0473)
- **Multi-Head Attention**: [Vaswani et al., 2017](https://arxiv.org/abs/1706.03762)
---
## π¨βπ» Author
Daksh Bhardwaj
Email: [email protected]
GitHub: [Daksh5555](https://github.com/daksh5555) |