Daksh0505's picture
Update README.md
f62b1e9 verified

A newer version of the Streamlit SDK is available: 1.51.0

Upgrade
metadata
colorTo: indigo
colorFrom: indigo
emoji: πŸ‘

English β†’ Hindi Translation with Seq2Seq + Multi-Head Attention

This Streamlit Space demonstrates the power of LSTM with self-attention mechanisms for sequence-to-sequence (Seq2Seq) tasks. Specifically, it showcases multi-head cross-attention in a translation setting.


πŸš€ Purpose

This Space is designed to illustrate how LSTM-based Seq2Seq models combined with attention mechanisms can perform language translation. It is intended for educational and demonstration purposes, highlighting:

  • Encoder-Decoder architecture using LSTMs
  • Multi-head attention for better context understanding
  • Sequence-to-sequence translation from English to Hindi
  • Comparison between smaller (12M parameters) and larger (42M parameters) models

🧠 Models

Model Parameters Vocabulary Training Data Repository
Model A 12M 50k 20k rows seq2seq-lstm-multiheadattention-12.3
Model B 42M 256k 100k rows seq2seq-lstm-multiheadattention-42
  • Model A performs better on small datasets it was trained on.
  • Model B has higher capacity but requires more diverse data to generalize well.

πŸ“‹ Features

  • Select a model size (12M or 42M parameters)
  • View model architecture layer-by-layer
  • Choose a sentence from the dataset to translate
  • Compare original vs predicted translation
  • Highlight how multi-head attention improves Seq2Seq performance

πŸ›  How it Works

  1. Encoder:

    • Processes the input English sentence
    • Embedding β†’ Layer Normalization β†’ Dropout β†’ BiLSTM β†’ Hidden states
  2. Decoder:

    • Receives previous token embeddings and encoder states
    • Applies multi-head cross-attention over encoder outputs
    • Generates the next token until <end> token is reached
  3. Prediction:

    • Step-by-step decoding using trained weights
    • Output Hindi sentence is reconstructed token by token

πŸ’» Usage

  1. Select the model size from the dropdown
  2. Expand Show Model Architecture to see layer details
  3. Select a sentence from the dataset
  4. Click Translate to view predicted Hindi translation

⚠️ Notes

  • Model performance depends on training data size and domain
  • Smaller model (12M) generalizes better on smaller datasets
  • Larger model (42M) requires more data and fine-tuning for small datasets

πŸ“š References


πŸ‘¨β€πŸ’» Author

Daksh Bhardwaj
Email: [email protected]
GitHub: Daksh5555