BindPred: Gradient Boosted Trees on ESM2 Embeddings

Model Overview

The BindPred model is a Gradient Boosted Trees (GBT) regressor trained on ESM2 embeddings from Meta’s ESM2 protein language model. It is designed for binding affinity predictive tasks.

Available Pretrianed Models:

ACE2_RBD_BindPred.json

Predicts binding affinity between ACE2 (human and animals) and RBD proteins.

ESM2_BindPred.json

General-purpose GBT model trained on ESM2 embeddings.

Model Details

• Base Model: ESM2

• Architecture: Gradient Boosted Trees (CatBoostRegressor)

• Framework: CatBoost

• Task: Regression

How to Use

Download Model from Hugging Face

from huggingface_hub import hf_hub_download

Download General model

model_path = hf_hub_download(repo_id="hbp5181/BindPred", filename="ESM2_BindPred.cbm")

Load Model in CatBoost

from catboost import CatBoostRegressor

model = CatBoostRegressor()

model.load_model(model_path, format="cbm")

Training Details

• Feature Extraction: ESM2 embeddings (33-layer transformer, 650M params)

• Training Algorithm: CatBoost Gradient Boosting

• Dataset:

  ACE2 RBD: https://github.com/jbloomlab/SARSr-CoV_homolog_survey
  
  General: https://zenodo.org/records/14271435
  

• Evaluation Metrics: RMSE, R^2

Applications

• Binding affinity predictions

Limitations & Considerations

• The model is trained on ESM2 embeddings and is limited by the quality of those embeddings.

• Performance depends on the training dataset used.

• Not a deep-learning model; instead, it leverages GBTs for fast, interpretable predictions.

Citation

👤 Maintainer: [email protected]

📅 Last Updated: February 2025

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hbp5181/BindPred

Finetuned
(24)
this model