File size: 3,011 Bytes
234f08f
 
ea5515e
f62b1e9
234f08f
0e9d162
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d40f59b
0e9d162
 
 
 
 
 
 
 
d40f59b
 
0e9d162
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234f08f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
colorTo: indigo
colorFrom: indigo
emoji: πŸ‘
---
# English β†’ Hindi Translation with Seq2Seq + Multi-Head Attention

This Streamlit Space demonstrates the **power of LSTM with self-attention mechanisms** for sequence-to-sequence (Seq2Seq) tasks. Specifically, it showcases **multi-head cross-attention** in a translation setting.

---

## πŸš€ Purpose

This Space is designed to **illustrate how LSTM-based Seq2Seq models combined with attention mechanisms** can perform language translation. It is intended for educational and demonstration purposes, highlighting:

- Encoder-Decoder architecture using LSTMs
- Multi-head attention for better context understanding
- Sequence-to-sequence translation from English to Hindi
- Comparison between **smaller (12M parameters)** and **larger (42M parameters)** models

---

## 🧠 Models

| Model | Parameters | Vocabulary | Training Data | Repository |
|-------|------------|-----------|---------------|------------|
| Model A | 12M | 50k | 20k rows | [seq2seq-lstm-multiheadattention-12.3](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) |
| Model B | 42M | 256k | 100k rows | [seq2seq-lstm-multiheadattention-42](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) |

- **Model A** performs better on small datasets it was trained on.  
- **Model B** has higher capacity but requires more diverse data to generalize well.  

---

## πŸ“‹ Features

- Select a model size (12M or 42M parameters)  
- View **model architecture** layer-by-layer  
- Choose a sentence from the dataset to translate  
- Compare **original vs predicted translation**  
- Highlight how multi-head attention improves Seq2Seq performance  

---

## πŸ›  How it Works

1. **Encoder**:  
   - Processes the input English sentence  
   - Embedding β†’ Layer Normalization β†’ Dropout β†’ BiLSTM β†’ Hidden states  

2. **Decoder**:  
   - Receives previous token embeddings and encoder states  
   - Applies multi-head cross-attention over encoder outputs  
   - Generates the next token until `<end>` token is reached  

3. **Prediction**:  
   - Step-by-step decoding using trained weights  
   - Output Hindi sentence is reconstructed token by token  

---

## πŸ’» Usage

1. Select the model size from the dropdown  
2. Expand **Show Model Architecture** to see layer details  
3. Select a sentence from the dataset  
4. Click **Translate** to view predicted Hindi translation  

---

## ⚠️ Notes

- Model performance depends on **training data size and domain**  
- Smaller model (12M) generalizes better on smaller datasets  
- Larger model (42M) requires **more data** and **fine-tuning** for small datasets  

---

## πŸ“š References

- **Seq2Seq with Attention**: [Bahdanau et al., 2014](https://arxiv.org/abs/1409.0473)  
- **Multi-Head Attention**: [Vaswani et al., 2017](https://arxiv.org/abs/1706.03762)  

---

## πŸ‘¨β€πŸ’» Author

Daksh Bhardwaj  
Email: [email protected]  
GitHub: [Daksh5555](https://github.com/daksh5555)