# Seq2Seq LSTM with Multi-Head Attention for English → Hindi Translation ## Model Overview This model performs **English to Hindi translation** using a **Seq2Seq architecture** with **LSTM-based encoder-decoder** and **multi-head cross-attention**. The attention mechanism helps the decoder focus on relevant parts of the input sentence during translation. - **Architecture**: BiLSTM Encoder + LSTM Decoder + Multi-Head Cross-Attention - **Task**: Language Translation (English → Hindi) - **License**: Open for research and demonstration purposes (educational use) --- ## Model Versions | Model | Parameters | Vocabulary | Training Data | Repository | |-------|------------|-----------|---------------|------------| | Model A | 12M | 50k | 20k English-Hindi sentence pairs | [seq2seq-lstm-multiheadattention-12.3](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) | | Model B | 42M | 256k | 100k English-Hindi sentence pairs | [seq2seq-lstm-multiheadattention-42](https://huggingface.co/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention) | - **Model A** is smaller and performs well on the dataset it was trained on. - **Model B** has higher capacity but needs more data for robust generalization. --- ## Intended Use - Demonstration and educational purposes - Understanding **Seq2Seq + Attention mechanisms** - Translating English sentences to Hindi - **Feature extraction**: Encoder outputs can be used for downstream NLP tasks by generating contextual embedding vectors that capture sentence-level semantics --- ## Not Intended For - High-stakes or production translation systems without further fine-tuning - Handling very large or domain-specific datasets without retraining --- ## Metrics - Evaluated qualitatively on selected test sentences - **Model A**: good accuracy for small, simple sentences - **Model B**: may require larger datasets for generalization *BLEU or other quantitative metrics can be added if evaluation is performed.* --- ## Training Data - **Source**: Collected English-Hindi parallel sentences - **Size**: - Model A: 20k sentence pairs - Model B: 100k sentence pairs - **Preprocessing**: Tokenization, padding, `` / `` tokens - **Dataset**: For further Fine-Tuning, training dataset is available in this model card --- ## Limitations - Larger model may underperform if trained on small datasets - Handles only sentence-level translation; not optimized for paragraphs - May produce incorrect translations for rare words or out-of-vocabulary terms - Larger model is only trained for epoch 1, so do not used it without fine tuning on your own dataset --- ## Example Usage ```python from huggingface_hub import hf_hub_download from tensorflow.keras.models import load_model import pickle # Load model model_path = hf_hub_download(repo_id="Daksh0505/Seq2Seq-LSTM-MultiHeadAttention", filename="seq2seq-lstm-multiheadattention-12.3.keras") model = load_model(model_path) # Load tokenizers tokenizer_path = hf_hub_download(repo_id="Daksh0505/Seq2Seq-LSTM-MultiHeadAttention", filename="seq2seq-tokenizers-12.3M.pkl") with open(tokenizer_path, "rb") as f: tokenizer = pickle.load(f) tokenizer_en = tokenizer['english'] tokenizer_hi = tokenizer['hindi'] ``` --- ## Step-by-Step Prediction Example For Encoder-Decoder inference visit [Daksh0505/Seq2Seq-LSTM-MultiHeadAttention-Translation](https://huggingface.co/spaces/Daksh0505/Seq2Seq-LSTM-MultiHeadAttention-Translation) ```python import numpy as np from tensorflow.keras.preprocessing.sequence import pad_sequences def preprocess_input(sentence, word2idx_en, max_seq_len, oov_token=""): oov_idx = word2idx_en[oov_token] seq = [word2idx_en.get(w.lower(), oov_idx) for w in sentence.split()] return pad_sequences([seq], maxlen=max_seq_len, padding='post') def decode_sequence(input_seq, encoder_model, decoder_model, word2idx_hi, idx2word_hi, max_seq_len): start_token = word2idx_hi[''] end_token = word2idx_hi[''] enc_outs, h, c = encoder_model.predict(input_seq, verbose=0) target_seq = np.array([[start_token]]) decoded_sentence = [] for _ in range(max_seq_len): output_tokens, h, c = decoder_model.predict([target_seq, h, c, enc_outs], verbose=0) sampled_idx = np.argmax(output_tokens[0,0,:]) if sampled_idx == end_token: break if sampled_idx > 0: decoded_sentence.append(idx2word_hi[sampled_idx]) target_seq[0,0] = sampled_idx return " ".join(decoded_sentence) # Example usage sentence = "Hello, how are you?" input_seq = preprocess_input(sentence, tokenizer_en.word_index, max_seq_len=40) translation = decode_sequence(input_seq, encoder_model, decoder_model, tokenizer_hi.word_index, tokenizer_hi.index_word, max_seq_len=40) print("Predicted Hindi Translation:", translation) ```