Longevity Protein Classifier v6

Fine-tuned ESM-2 150M for binary classification of protein sequences as longevity-associated or not, trained on multi-species GenAge data with LoRA adapters.

Built as part of a personal ML learning arc β€” Week 3 of 8 β€” connecting protein language models to longevity biology.


Model Description

  • Model type: ESM-2 150M + LoRA (r=16) sequence classifier
  • Base model: facebook/esm2_t30_150M_UR50D
  • Task: Binary classification β€” longevity-associated vs non-longevity
  • Developed by: Mo Elzek
  • License: Apache 2.0

Performance

Metric Value
Test AUPRC 0.335
Test AUC-ROC 0.696
Random AUPRC baseline 0.061
Improvement over random 5.5x
Training epochs 10 (early stopping)

Benchmark Results

Protein Score Expected Notes
SIRT1 0.996 HIGH NAD+ deacetylase, caloric restriction mediator
SIRT3 0.998 HIGH Mitochondrial sirtuin
TP53 0.974 HIGH Tumour suppressor, aging roles
MYH9 0.000 LOW Structural myosin β€” negative control
ACTB 0.000 LOW Beta actin β€” negative control
ALB 0.000 LOW Serum albumin β€” negative control
FOXO3 0.000 HIGH Fails β€” see limitations
MTOR 0.000 HIGH Fails β€” see limitations
TERT 0.000 HIGH Fails β€” see limitations

Novel Predictions Not in GenAge

Proteins scoring above 0.50 that are not present in GenAge human database. These are the model's predictions of longevity-relevant proteins not yet catalogued β€” not validated findings.

Protein Score Biological relevance
TFEB 0.502 Master regulator of autophagy and lysosomal biogenesis. Overexpression extends lifespan in C. elegans. Regulated by mTOR. Strongest novel prediction.
NEIL1 0.951 DNA glycosylase, base excision repair of oxidative damage. DNA repair capacity correlates with species lifespan.
GSTA1 0.871 Glutathione S-transferase. Antioxidant defence. GST family implicated in longevity across multiple species.
GRHL1 0.880 Grainyhead-like transcription factor. Epithelial barrier maintenance β€” tissue integrity declines with age.
EXO1 0.550 Exonuclease involved in DNA mismatch repair and double-strand break repair.
MSH4 0.546 DNA mismatch repair. Related family members (MSH2, MSH6) are established longevity-associated genes.

Recommended Thresholds

Use case Threshold Precision Recall
Screening β€” cast wide net 0.05 ~0.20 ~29%
Balanced 0.06 ~0.41 ~29%
High confidence hits only 0.50 ~0.61 ~24%

Optimised threshold from val set: 0.06 (F1: 0.358)

The model produces a bimodal distribution β€” proteins it recognises score very high (above 0.50), proteins it does not score near zero. The flat recall curve from 0.05 to 0.70 reflects this β€” most longevity proteins are either clearly found or clearly missed.


Known Limitations β€” Read Before Use

1. Protein length truncation

Sequences longer than 512 amino acids are truncated from the C-terminus. This causes systematic failures on long proteins where the functional domain sits in the C-terminal half:

  • MTOR (2,549 aa): kinase domain at residues 2181-2431 β€” truncated away
  • TERT (1,132 aa): reverse transcriptase domain at 600-900 β€” truncated away

Do not use this model to score proteins above 800 amino acids without validating on known examples from that protein family first.

2. Family-specific blind spots

The model learned sirtuin and tumour suppressor sequence features well but has insufficient training examples to generalise to:

  • Forkhead transcription factors (FOXO3 scores 0.000 despite being a canonical longevity gene and fitting within the 512 aa window)
  • Large kinases (truncation compounds this)
  • Telomerase complex proteins

3. Direction of effect not captured

The model cannot distinguish between:

  • Pro-longevity proteins (overexpression extends lifespan)
  • Anti-aging-disease proteins (loss of function accelerates aging)

Both may score high. A high score means "associated with longevity biology" not "activating this protein extends lifespan."

4. Not validated experimentally

Novel predictions are model outputs only. No wet lab validation has been performed. TFEB is the strongest prediction based on prior literature but this model did not discover TFEB β€” it independently ranked it highly, consistent with existing biology.

5. Not for clinical use

This is a research screening tool. Do not use for any clinical, diagnostic, or therapeutic decision-making.


Training Data

Positive set: GenAge database (genomics.senescence.info)

  • Human GenAge: 306 human longevity-associated genes
  • Model organism GenAge: Pro-Longevity genes only from 4 species
    • C. elegans: 283 genes
    • D. melanogaster: 125 genes
    • M. musculus: 85 genes
  • Total positives: ~574

Negative set: Swiss-Prot reviewed proteins from same species

  • Sampled proportionally per species (NEG_RATIO=10)
  • Species weights applied: human 2.0x, mouse 1.5x, worm/fly 1.0x
  • "Necessary for fitness" genes excluded from universe entirely
  • Anti-Longevity genes excluded from positives

Filtering:

  • Sequence length: 50-1500 amino acids
  • Swiss-Prot reviewed only (manually curated)

Training Procedure

Architecture: ESM-2 150M + LoRA adapters

  • LoRA rank: r=16, alpha=32, dropout=0.15
  • Target modules: query, value attention projections
  • Trainable parameters: ~4.7M of 150M total (3.1%)

Loss function: Focal loss with contrastive margin penalty

  • gamma=1.0 (softer than standard gamma=2.0)
  • Label smoothing=0.1
  • Contrastive margin=0.30 (explicit separation penalty)
  • Class weights: balanced

Optimiser: AdamW, lr=2e-4, weight_decay=0.01 Schedule: Cosine with warmup (10% warmup steps) Early stopping: Patience=4 on val AUPRC Best epoch: 10 of 20

Hardware: NVIDIA T4 16GB (Kaggle) Training time: ~2 hours


How to Use

from transformers import AutoTokenizer, EsmForSequenceClassification
from peft import PeftModel
import torch

# Load model
base = EsmForSequenceClassification.from_pretrained(
    "facebook/esm2_t30_150M_UR50D",
    num_labels=2,
    ignore_mismatched_sizes=True
)
model = PeftModel.from_pretrained(base, "YOUR_USERNAME/longevity-esm2-v6")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/longevity-esm2-v6")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()

def score_sequence(sequence, threshold=0.06):
    inputs = tokenizer(
        sequence,
        max_length=512,
        padding="max_length",
        truncation=True,
        return_tensors="pt"
    )
    with torch.no_grad():
        outputs = model(
            input_ids=inputs["input_ids"].to(device),
            attention_mask=inputs["attention_mask"].to(device)
        )
        prob = torch.softmax(outputs.logits, dim=1)[:, 1].item()
    return {
        "probability": round(prob, 4),
        "prediction": "Longevity" if prob >= threshold else "Non-longevity",
        "threshold": threshold,
        "warning": "Truncated to 512 aa" if len(sequence) > 512 else None
    }

# Example
result = score_sequence("MKTAYIAKQRQISFVK...")
print(result)

Recommended thresholds:

  • 0.05-0.06 for screening (maximise recall)
  • 0.50 for high-confidence hits only

Experiment History

This model is v6 in a series of iterative experiments:

Version Key change Test AUPRC
v1 Frozen encoder, 186 positives Collapsed
v2 LoRA r=8, 277 positives 0.027
v3 ESM-2 150M, multi-species, ~2000 positives 0.302
v4 Pro-Longevity filter, focal loss gamma=2 0.250
v5 Cleaned species, gamma=1, label smoothing 0.323
v6 (this) Pathway-stratified split, contrastive margin 0.335

Citation

If you use this model in research, please cite: @misc{elzek2026longevity, author = {Elzek, Mo}, title = {Longevity Protein Classifier: Multi-species ESM-2 Fine-tuning}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/YOUR_USERNAME/longevity-esm2-v6} }


Contact

Built by Mo Elzek as part of the London Longevity Network ML project arc.
Feedback and collaboration welcome.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mawe2/longevity-esm2-v4

Finetuned
(17)
this model