Spaces:

Daksh0505
/

Seq2Seq-LSTM-MultiHeadAttention-Translation

Running

Seq2Seq-LSTM-MultiHeadAttention-Translation

File size: 3,011 Bytes

---
colorTo: indigo
colorFrom: indigo
emoji: 👁
---
# English → Hindi Translation with Seq2Seq + Multi-Head Attention

This Streamlit Space demonstrates the **power of LSTM with self-attention mechanisms** for sequence-to-sequence (Seq2Seq) tasks. Specifically, it showcases **multi-head cross-attention** in a translation setting.

---

## 🚀 Purpose

This Space is designed to **illustrate how LSTM-based Seq2Seq models combined with attention mechanisms** can perform language translation. It is intended for educational and demonstration purposes, highlighting:

- Encoder-Decoder architecture using LSTMs
- Multi-head attention for better context understanding
- Sequence-to-sequence translation from English to Hindi
- Comparison between **smaller (12M parameters)** and **larger (42M parameters)** models

---

## 🧠 Models

| Model | Parameters | Vocabulary | Training Data | Repository |
|-------|------------|-----------|---------------|------------|
| Model A | 12M | 50k | 20k rows | [seq2seq-lstm-multiheadattention-12.3](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) |
| Model B | 42M | 256k | 100k rows | [seq2seq-lstm-multiheadattention-42](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) |

- **Model A** performs better on small datasets it was trained on.  
- **Model B** has higher capacity but requires more diverse data to generalize well.  

---

## 📋 Features

- Select a model size (12M or 42M parameters)  
- View **model architecture** layer-by-layer  
- Choose a sentence from the dataset to translate  
- Compare **original vs predicted translation**  
- Highlight how multi-head attention improves Seq2Seq performance  

---

## 🛠 How it Works

1. **Encoder**:  
   - Processes the input English sentence  
   - Embedding → Layer Normalization → Dropout → BiLSTM → Hidden states  

2. **Decoder**:  
   - Receives previous token embeddings and encoder states  
   - Applies multi-head cross-attention over encoder outputs  
   - Generates the next token until `<end>` token is reached  

3. **Prediction**:  
   - Step-by-step decoding using trained weights  
   - Output Hindi sentence is reconstructed token by token  

---

## 💻 Usage

1. Select the model size from the dropdown  
2. Expand **Show Model Architecture** to see layer details  
3. Select a sentence from the dataset  
4. Click **Translate** to view predicted Hindi translation  

---

## ⚠️ Notes

- Model performance depends on **training data size and domain**  
- Smaller model (12M) generalizes better on smaller datasets  
- Larger model (42M) requires **more data** and **fine-tuning** for small datasets  

---

## 📚 References

- **Seq2Seq with Attention**: [Bahdanau et al., 2014](https://arxiv.org/abs/1409.0473)  
- **Multi-Head Attention**: [Vaswani et al., 2017](https://arxiv.org/abs/1706.03762)  

---

## 👨‍💻 Author

Daksh Bhardwaj  
Email: [email protected]  
GitHub: [Daksh5555](https://github.com/daksh5555)