Longevity Protein Classifier
Fine-tuned ESM-2 150M for binary classification of protein sequences as longevity-associated or not. Trained on multi-species GenAge data with LoRA adapters. Built to connect protein language models to longevity biology.
Performance
| Metric | Value |
|---|---|
| Test AUPRC | 0.335 |
| Test AUC-ROC | 0.696 |
| Random baseline AUPRC | 0.061 |
| Improvement over random | 5.5x |
| Best epoch | 10 of 20 |
Benchmark Results
| Protein | Score | Pass/Fail | Notes |
|---|---|---|---|
| SIRT1 | 0.996 | PASS | NAD+ deacetylase, caloric restriction |
| SIRT3 | 0.998 | PASS | Mitochondrial sirtuin |
| TP53 | 0.974 | PASS | Tumour suppressor, aging roles |
| MYH9 | 0.000 | PASS | Negative control β structural myosin |
| ACTB | 0.000 | PASS | Negative control β beta actin |
| ALB | 0.000 | PASS | Negative control β serum albumin |
| FOXO3 | 0.000 | FAIL | Known limitation β see below |
| MTOR | 0.000 | FAIL | Known limitation β truncated at 512aa |
| TERT | 0.000 | FAIL | Known limitation β truncated at 512aa |
Novel Predictions Not in GenAge
Proteins scoring above 0.50 that are not in the GenAge human database. These are model predictions only β not experimentally validated.
| Protein | Score | Biological relevance |
|---|---|---|
| NEIL1 | 0.951 | DNA repair of oxidative damage. DNA repair capacity correlates with species lifespan |
| GRHL1 | 0.880 | Epithelial barrier maintenance. Tissue integrity declines with age |
| GSTA1 | 0.871 | Glutathione S-transferase antioxidant. GST family implicated in longevity across species |
| TFEB | 0.502 | Master regulator of autophagy and lysosomal biogenesis. Overexpression extends lifespan in C. elegans. Regulated by mTOR |
| EXO1 | 0.550 | DNA mismatch repair exonuclease |
| MSH4 | 0.546 | DNA mismatch repair. Related family members MSH2 and MSH6 are established longevity genes |
TFEB is the strongest novel prediction. It is mechanistically connected to mTOR (already in GenAge), independently predicted at 0.502 by a model trained with no pathway information.
Recommended Thresholds
| Use case | Threshold |
|---|---|
| Screening β maximise recall | 0.05 |
| Balanced β default | 0.06 |
| High confidence hits only | 0.50 |
Known Limitations
1. Protein length Sequences longer than 512 amino acids are truncated from the C-terminus. This causes failures on long proteins where the functional domain sits in the C-terminal half. MTOR (2,549 aa) and TERT (1,132 aa) both fail for this reason. Do not use this model on proteins above 800 amino acids without validating first.
2. Family-specific blind spots The model learned sirtuin and tumour suppressor sequence features well but has insufficient training examples to generalise to forkhead transcription factors. FOXO3 (402 aa, fits within 512 window) scores 0.000 despite being a canonical longevity gene. This is a training data coverage problem, not a truncation problem.
3. Direction of effect not captured The model cannot distinguish pro-longevity proteins (overexpression extends lifespan) from anti-aging-disease proteins (loss of function accelerates aging). A high score means associated with longevity biology, not activating this protein extends lifespan.
4. Not for clinical use Research screening tool only. Do not use for clinical, diagnostic, or therapeutic decisions.
How to Use
from transformers import AutoTokenizer, EsmForSequenceClassification
from peft import PeftModel
import torch
base = EsmForSequenceClassification.from_pretrained(
"facebook/esm2_t30_150M_UR50D",
num_labels=2,
ignore_mismatched_sizes=True
)
model = PeftModel.from_pretrained(base, "mawe/longevity-esm2-v6")
tokenizer = AutoTokenizer.from_pretrained("mawe/longevity-esm2-v6")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()
def score_sequence(sequence, threshold=0.06):
inputs = tokenizer(
sequence,
max_length=512,
padding="max_length",
truncation=True,
return_tensors="pt"
)
with torch.no_grad():
outputs = model(
input_ids=inputs["input_ids"].to(device),
attention_mask=inputs["attention_mask"].to(device)
)
prob = torch.softmax(outputs.logits, dim=1)[:, 1].item()
return {
"probability": round(prob, 4),
"prediction": "Longevity" if prob >= threshold else "Non-longevity",
"warning": "Sequence truncated to 512aa" if len(sequence) > 512 else None
}
Training Details
Positive set: GenAge database
- Human GenAge: 306 genes (all entries)
- C. elegans Pro-Longevity: 283 genes
- D. melanogaster Pro-Longevity: 125 genes
- M. musculus Pro-Longevity: 85 genes
- Total positives: 574
Negative set: Swiss-Prot reviewed proteins
- NEG_RATIO: 10 negatives per positive
- Species weights: human 2.0x, mouse 1.5x, worm and fly 1.0x
- Necessary-for-fitness genes excluded from universe
- Anti-Longevity genes excluded from positives
Architecture: ESM-2 150M + LoRA r=16, alpha=32, dropout=0.15
Loss: Focal loss gamma=1.0, label smoothing=0.1, contrastive margin=0.30
Optimiser: AdamW lr=2e-4, cosine schedule, 10% warmup
Hardware: NVIDIA T4 16GB on Kaggle
Experiment History
| Version | Key change | Test AUPRC |
|---|---|---|
| v1 | Frozen encoder, 186 positives | Collapsed |
| v2 | LoRA r=8, 277 positives | 0.027 |
| v3 | ESM-2 150M, multi-species, 2000 positives | 0.302 |
| v4 | Pro-Longevity filter, focal loss gamma=2 | 0.250 |
| v5 | Cleaned species, gamma=1, label smoothing | 0.323 |
| v6 | Pathway-stratified split, contrastive margin | 0.335 |
Contact
Feedback and collaboration welcome.
Model tree for mawe2/longevity-esm2-v6
Base model
facebook/esm2_t30_150M_UR50D