🧠 Emotion Sequence Transformer + BiLSTM MP478 Seq256

πŸ“˜ Overview

This repository provides a Transformer + BiLSTM-based Emotion Recognition model trained on MediaPipe landmark sequences extracted from facial points. The model classifies human emotions into six categories: Angry, Disgust, Fear, Happy, Neutral, Sad.

It processes temporal sequences of 256 frames per clip with 478 landmarks per frame, learning the dynamic patterns of human expression. The model is optimized for real-time emotion inference and can be used in applications such as sign language understanding and emotion-aware human-computer interaction.


🧩 Model Architecture

The model is built using Transformer layers for sequence modeling:

  1. Input Layer:

    • Accepts sequences of shape (256, 478*3) corresponding to 3D coordinates of 478 landmarks over 256 frames.
  2. Transformer Encoder Layers:

    • Capture temporal dependencies and dynamic patterns of human motion using self-attention mechanisms.
  3. Fully Connected Layers:

    • Transform the encoder outputs into probabilities for six emotion classes.
  4. Output Layer:

    • Softmax activation for multi-class emotion classification.

πŸ“Š Dataset

Custom MediaPipe Landmark Dataset

  • Extracted from labeled video clips representing six emotions.
  • Preprocessing includes normalization, sequence grouping (256 frames per clip), and balanced augmentation.
  • Total dataset is split into training, validation, and test sets.

βš™οΈ Training Configuration

Parameter Description
Architecture Transformer
Sequence Length 256 frames
Input Features 478 landmarks Γ— 3 coordinates
Optimizer Adam
Learning Rate 1e-4
Loss Function CrossEntropyLoss
Batch Size 32
Epochs 60

πŸ“ˆ Performance Summary

Metric Score
Accuracy 0.71
Macro F1 0.70
Weighted F1 0.71

Classification Report

Class Precision Recall F1-Score Support
Angry 0.78 0.63 0.70 139
Disgust 0.77 0.79 0.78 128
Fear 0.50 0.58 0.54 114
Happy 0.95 0.92 0.94 129
Neutral 0.61 0.81 0.69 101
Sad 0.64 0.52 0.58 134

πŸ–ΌοΈ Visualizations

  • Confusion Matrix Training Accuracy and Loss Training and validation accuracy and loss.

  • Multi-Class ROC Curves (AUC per class) ROC Curves Multi-class ROC curves with AUC values.

  • Confusion Matrix (Heatmap) Confusion Matrix Confusion matrix heatmap on test set.


🧩 Model Files

File Description
emotion_sequence_transformer_mp478_seq256.pt Original PyTorch Transformer model
emotion_sequence_transformer_mp478_seq256_weights.pt Original PyTorch Transformer model weights

πŸš€ Usage and Preprocessing

To correctly use this model for prediction, you must first preprocess your video data using the provided assets for standardization and label encoding.

1. Preprocessing Assets

The necessary files for video preprocessing are stored in the assets/ folder of this repository:

File Name Purpose Required for Step
emotion_label_encoder.joblib Maps predicted indices back to human-readable emotion labels (e.g., 0 -> 'Happy'). Post-Inference
global_mean_tensor.pt Global mean tensor used to normalize the extracted MediaPipe features. Preprocessing
global_std_tensor.pt Global standard deviation tensor used to normalize the extracted MediaPipe features. Preprocessing

You must load the mean tensor and std tensor to standardize your input feature sequences before feeding them into the BiLSTM model.

2. Complete Example

For a full, runnable demonstration showing how to load the model, use the assets for standardization, and run inference on a video, please refer to the usage notebook:

  • Notebook: emotion-sequence-transformer-bilstm-usage.ipynb

This file provides the complete code necessary to replicate the deployment environment.


πŸš€ Key Features

  • Real-time emotion recognition from MediaPipe landmarks
  • Transformer-based sequence modeling for dynamic human motion
  • Handles six primary emotion classes

🏷️ Tags

emotion-recognition transformer sequential-data mediapipe human-emotion deep-learning pytorch torchscript affective-computing fine-tuning real-time


πŸ‘€ Author & Model Info

Author: P.S. Abewickrama Singhe
Developed with: PyTorch
License: Apache-2.0
Date: October 2025

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results

  • accuracy on Optimized 478-Point 3D Facial Landmark Dataset
    self-reported
    0.710