BindPred: Gradient Boosted Trees on ESM2 Embeddings
Model Overview
The BindPred model is a Gradient Boosted Trees (GBT) regressor trained on ESM2 embeddings from Meta’s ESM2 protein language model. It is designed for binding affinity predictive tasks.
Available Pretrianed Models:
ACE2_RBD_BindPred.json
Predicts binding affinity between ACE2 (human and animals) and RBD proteins.
ESM2_BindPred.json
General-purpose GBT model trained on ESM2 embeddings.
Model Details
• Base Model: ESM2
• Architecture: Gradient Boosted Trees (CatBoostRegressor)
• Framework: CatBoost
• Task: Regression
How to Use
Download Model from Hugging Face
from huggingface_hub import hf_hub_download
Download General model
model_path = hf_hub_download(repo_id="hbp5181/BindPred", filename="ESM2_BindPred.cbm")
Load Model in CatBoost
from catboost import CatBoostRegressor
model = CatBoostRegressor()
model.load_model(model_path, format="cbm")
Training Details
• Feature Extraction: ESM2 embeddings (33-layer transformer, 650M params)
• Training Algorithm: CatBoost Gradient Boosting
• Dataset:
ACE2 RBD: https://github.com/jbloomlab/SARSr-CoV_homolog_survey
General: https://zenodo.org/records/14271435
• Evaluation Metrics: RMSE, R^2
Applications
• Binding affinity predictions
Limitations & Considerations
• The model is trained on ESM2 embeddings and is limited by the quality of those embeddings.
• Performance depends on the training dataset used.
• Not a deep-learning model; instead, it leverages GBTs for fast, interpretable predictions.
Citation
👤 Maintainer: [email protected]
📅 Last Updated: February 2025
Model tree for hbp5181/BindPred
Base model
facebook/esm2_t33_650M_UR50D