# Seq2Seq LSTM with Multi-Head Attention for English → Hindi Translation

## Model Overview

This model performs **English to Hindi translation** using a **Seq2Seq architecture** with **LSTM-based encoder-decoder** and **multi-head cross-attention**. The attention mechanism helps the decoder focus on relevant parts of the input sentence during translation.

- **Architecture**: BiLSTM Encoder + LSTM Decoder + Multi-Head Cross-Attention
- **Task**: Language Translation (English → Hindi)
- **License**: Open for research and demonstration purposes (educational use)

---

## Model Versions

| Model | Parameters | Vocabulary | Training Data | Repository |
|-------|------------|-----------|---------------|------------|
| Model A | 12M | 50k | 20k English-Hindi sentence pairs | [seq2seq-lstm-multiheadattention-12.3](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) |
| Model B | 42M | 256k | 100k English-Hindi sentence pairs | [seq2seq-lstm-multiheadattention-42](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) |

- **Model A** is smaller and performs well on the dataset it was trained on.  
- **Model B** has higher capacity but needs more data for robust generalization.  

---

## Intended Use

- Demonstration and educational purposes  
- Understanding **Seq2Seq + Attention mechanisms**  
- Translating English sentences to Hindi
- **Feature extraction**: Encoder outputs can be used for downstream NLP tasks by generating contextual embedding vectors that capture sentence-level semantics
---

## Not Intended For

- High-stakes or production translation systems without further fine-tuning  
- Handling very large or domain-specific datasets without retraining

---

## Metrics

- Evaluated qualitatively on selected test sentences  
- **Model A**: good accuracy for small, simple sentences  
- **Model B**: may require larger datasets for generalization  

*BLEU or other quantitative metrics can be added if evaluation is performed.*

---

## Training Data

- **Source**: Collected English-Hindi parallel sentences  
- **Size**:
  - Model A: 20k sentence pairs  
  - Model B: 100k sentence pairs  
- **Preprocessing**: Tokenization, padding, `<start>` / `<end>` tokens
- **Dataset**: For further Fine-Tuning, training dataset is available in this model card

---

## Limitations

- Larger model may underperform if trained on small datasets  
- Handles only sentence-level translation; not optimized for paragraphs  
- May produce incorrect translations for rare words or out-of-vocabulary terms
- Larger model is only trained for epoch 1, so do not used it without fine tuning on your own dataset

---

## Example Usage

```python
from huggingface_hub import hf_hub_download
from tensorflow.keras.models import load_model
import pickle

# Load model
model_path = hf_hub_download(repo_id="Daksh0505/Seq2Seq-LSTM-MultiHeadAttention", filename="seq2seq-lstm-multiheadattention-12.3.keras")
model = load_model(model_path)

# Load tokenizers
tokenizer_path = hf_hub_download(repo_id="Daksh0505/Seq2Seq-LSTM-MultiHeadAttention", filename="seq2seq-tokenizers-12.3M.pkl")
with open(tokenizer_path, "rb") as f:
    tokenizer = pickle.load(f)
tokenizer_en = tokenizer['english']
tokenizer_hi = tokenizer['hindi']
```

---

## Step-by-Step Prediction Example
For Encoder-Decoder inference visit [Daksh0505/Seq2Seq-LSTM-MultiHeadAttention-Translation](https://huggingface.co/spaces/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention-Translation)
```python
import numpy as np
from tensorflow.keras.preprocessing.sequence import pad_sequences

def preprocess_input(sentence, word2idx_en, max_seq_len, oov_token="<OOV>"):
    oov_idx = word2idx_en[oov_token]
    seq = [word2idx_en.get(w.lower(), oov_idx) for w in sentence.split()]
    return pad_sequences([seq], maxlen=max_seq_len, padding='post')

def decode_sequence(input_seq, encoder_model, decoder_model, word2idx_hi, idx2word_hi, max_seq_len):
    start_token = word2idx_hi['<start>']
    end_token = word2idx_hi['<end>']

    enc_outs, h, c = encoder_model.predict(input_seq, verbose=0)
    target_seq = np.array([[start_token]])
    decoded_sentence = []

    for _ in range(max_seq_len):
        output_tokens, h, c = decoder_model.predict([target_seq, h, c, enc_outs], verbose=0)
        sampled_idx = np.argmax(output_tokens[0,0,:])
        if sampled_idx == end_token:
            break
        if sampled_idx > 0:
            decoded_sentence.append(idx2word_hi[sampled_idx])
        target_seq[0,0] = sampled_idx

    return " ".join(decoded_sentence)

# Example usage
sentence = "Hello, how are you?"
input_seq = preprocess_input(sentence, tokenizer_en.word_index, max_seq_len=40)
translation = decode_sequence(input_seq, encoder_model, decoder_model, tokenizer_hi.word_index, tokenizer_hi.index_word, max_seq_len=40)
print("Predicted Hindi Translation:", translation)
```