Protein-Protein Interaction Site Prediction
This model is a finetuned version of ESM2-3B [1] for protein-protein interaction site prediction. It predicts whether a certain amino acid in a protein sequence is part of an interaction site (1) or not (0).
For more details on the training and testing on this model, refer to the article [...].
The github repository to use with this model is available here: https://github.com/RitAreaSciencePark/PPI-Reps
The data for the training and evaluation of this model is available in csv format in this zenodo repository: https://doi.org/10.5281/zenodo.18802482
How to Get Started with the Model
import torch
from transformers import AutoModel, AutoTokenizer, AutoConfig
model_name = "evillegasgarcia/esm2-ppi-biolip-1"
# Load config
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
# Load model using the custom remote code
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
#move model to device
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# run over a sample sequence
sequence = "MKTVRQERLKSIVRILEAAKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
inputs = tokenizer.encode(sequence, return_tensors="pt").to(device)
logits = model(inputs)["logits"]
probabilities = torch.sigmoid(logits)
probabilities
Training Details
The model was trained on a curated subset of the biolip dataset taken from [2]. We used the Adam optimizer with default hyperparameters, and weight decay of 0.05. The learning rate was 1e-5 and we had a gradient accumulation batch size of 2.
Evaluation
The performance of the model was tested on the ZK448 benchmark available from the Zenodo repository and originally curated by [3]. The model has an accuracy of 0.74 and a Matthews Correlation Coefficient (MCC) score of 0.35.
References
- Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., ... & Rives, A. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637), 1123-1130.
- Zhang, J., & Kurgan, L. (2018). Review and comparative assessment of sequence-based predictors of protein-binding residues. Briefings in bioinformatics, 19(5), 821-837.
- Downloads last month
- 9