Longevity Protein Classifier v6

Fine-tuned ESM-2 150M for binary classification of protein sequences as longevity-associated or not, trained on multi-species GenAge data with LoRA adapters.

Built as part of a personal ML learning arc — Week 3 of 8 — connecting protein language models to longevity biology.

Model Description

Model type: ESM-2 150M + LoRA (r=16) sequence classifier
Base model: facebook/esm2_t30_150M_UR50D
Task: Binary classification — longevity-associated vs non-longevity
Developed by: Mo Elzek
License: Apache 2.0

Performance

Metric	Value
Test AUPRC	0.335
Test AUC-ROC	0.696
Random AUPRC baseline	0.061
Improvement over random	5.5x
Training epochs	10 (early stopping)

Benchmark Results

Protein	Score	Expected	Notes
SIRT1	0.996	HIGH	NAD+ deacetylase, caloric restriction mediator
SIRT3	0.998	HIGH	Mitochondrial sirtuin
TP53	0.974	HIGH	Tumour suppressor, aging roles
MYH9	0.000	LOW	Structural myosin — negative control
ACTB	0.000	LOW	Beta actin — negative control
ALB	0.000	LOW	Serum albumin — negative control
FOXO3	0.000	HIGH	Fails — see limitations
MTOR	0.000	HIGH	Fails — see limitations
TERT	0.000	HIGH	Fails — see limitations

Novel Predictions Not in GenAge

Proteins scoring above 0.50 that are not present in GenAge human database. These are the model's predictions of longevity-relevant proteins not yet catalogued — not validated findings.

Protein	Score	Biological relevance
TFEB	0.502	Master regulator of autophagy and lysosomal biogenesis. Overexpression extends lifespan in C. elegans. Regulated by mTOR. Strongest novel prediction.
NEIL1	0.951	DNA glycosylase, base excision repair of oxidative damage. DNA repair capacity correlates with species lifespan.
GSTA1	0.871	Glutathione S-transferase. Antioxidant defence. GST family implicated in longevity across multiple species.
GRHL1	0.880	Grainyhead-like transcription factor. Epithelial barrier maintenance — tissue integrity declines with age.
EXO1	0.550	Exonuclease involved in DNA mismatch repair and double-strand break repair.
MSH4	0.546	DNA mismatch repair. Related family members (MSH2, MSH6) are established longevity-associated genes.

Recommended Thresholds

Use case	Threshold	Precision	Recall
Screening — cast wide net	0.05	~0.20	~29%
Balanced	0.06	~0.41	~29%
High confidence hits only	0.50	~0.61	~24%

Optimised threshold from val set: 0.06 (F1: 0.358)

The model produces a bimodal distribution — proteins it recognises score very high (above 0.50), proteins it does not score near zero. The flat recall curve from 0.05 to 0.70 reflects this — most longevity proteins are either clearly found or clearly missed.

Known Limitations — Read Before Use

1. Protein length truncation

Sequences longer than 512 amino acids are truncated from the C-terminus. This causes systematic failures on long proteins where the functional domain sits in the C-terminal half:

MTOR (2,549 aa): kinase domain at residues 2181-2431 — truncated away
TERT (1,132 aa): reverse transcriptase domain at 600-900 — truncated away

Do not use this model to score proteins above 800 amino acids without validating on known examples from that protein family first.

2. Family-specific blind spots

The model learned sirtuin and tumour suppressor sequence features well but has insufficient training examples to generalise to:

Forkhead transcription factors (FOXO3 scores 0.000 despite being a canonical longevity gene and fitting within the 512 aa window)
Large kinases (truncation compounds this)
Telomerase complex proteins

3. Direction of effect not captured

The model cannot distinguish between:

Pro-longevity proteins (overexpression extends lifespan)
Anti-aging-disease proteins (loss of function accelerates aging)

Both may score high. A high score means "associated with longevity biology" not "activating this protein extends lifespan."

4. Not validated experimentally

Novel predictions are model outputs only. No wet lab validation has been performed. TFEB is the strongest prediction based on prior literature but this model did not discover TFEB — it independently ranked it highly, consistent with existing biology.

5. Not for clinical use

This is a research screening tool. Do not use for any clinical, diagnostic, or therapeutic decision-making.

Training Data

Positive set: GenAge database (genomics.senescence.info)

Human GenAge: 306 human longevity-associated genes
Model organism GenAge: Pro-Longevity genes only from 4 species
- C. elegans: 283 genes
- D. melanogaster: 125 genes
- M. musculus: 85 genes
Total positives: ~574

Negative set: Swiss-Prot reviewed proteins from same species

Sampled proportionally per species (NEG_RATIO=10)
Species weights applied: human 2.0x, mouse 1.5x, worm/fly 1.0x
"Necessary for fitness" genes excluded from universe entirely
Anti-Longevity genes excluded from positives

Filtering:

Sequence length: 50-1500 amino acids
Swiss-Prot reviewed only (manually curated)

Training Procedure

Architecture: ESM-2 150M + LoRA adapters

LoRA rank: r=16, alpha=32, dropout=0.15
Target modules: query, value attention projections
Trainable parameters: ~4.7M of 150M total (3.1%)

Loss function: Focal loss with contrastive margin penalty

gamma=1.0 (softer than standard gamma=2.0)
Label smoothing=0.1
Contrastive margin=0.30 (explicit separation penalty)
Class weights: balanced

Optimiser: AdamW, lr=2e-4, weight_decay=0.01 Schedule: Cosine with warmup (10% warmup steps) Early stopping: Patience=4 on val AUPRC Best epoch: 10 of 20

Hardware: NVIDIA T4 16GB (Kaggle) Training time: ~2 hours

How to Use

from transformers import AutoTokenizer, EsmForSequenceClassification
from peft import PeftModel
import torch

# Load model
base = EsmForSequenceClassification.from_pretrained(
    "facebook/esm2_t30_150M_UR50D",
    num_labels=2,
    ignore_mismatched_sizes=True
)
model = PeftModel.from_pretrained(base, "YOUR_USERNAME/longevity-esm2-v6")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/longevity-esm2-v6")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()

def score_sequence(sequence, threshold=0.06):
    inputs = tokenizer(
        sequence,
        max_length=512,
        padding="max_length",
        truncation=True,
        return_tensors="pt"
    )
    with torch.no_grad():
        outputs = model(
            input_ids=inputs["input_ids"].to(device),
            attention_mask=inputs["attention_mask"].to(device)
        )
        prob = torch.softmax(outputs.logits, dim=1)[:, 1].item()
    return {
        "probability": round(prob, 4),
        "prediction": "Longevity" if prob >= threshold else "Non-longevity",
        "threshold": threshold,
        "warning": "Truncated to 512 aa" if len(sequence) > 512 else None
    }

# Example
result = score_sequence("MKTAYIAKQRQISFVK...")
print(result)

Recommended thresholds:

0.05-0.06 for screening (maximise recall)
0.50 for high-confidence hits only

Experiment History

This model is v6 in a series of iterative experiments:

Version	Key change	Test AUPRC
v1	Frozen encoder, 186 positives	Collapsed
v2	LoRA r=8, 277 positives	0.027
v3	ESM-2 150M, multi-species, ~2000 positives	0.302
v4	Pro-Longevity filter, focal loss gamma=2	0.250
v5	Cleaned species, gamma=1, label smoothing	0.323
v6 (this)	Pathway-stratified split, contrastive margin	0.335

Citation

If you use this model in research, please cite: @misc{elzek2026longevity, author = {Elzek, Mo}, title = {Longevity Protein Classifier: Multi-species ESM-2 Fine-tuning}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/YOUR_USERNAME/longevity-esm2-v6} }

Contact

Built by Mo Elzek as part of the London Longevity Network ML project arc.
Feedback and collaboration welcome.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mawe2/longevity-esm2-v4

Base model

facebook/esm2_t30_150M_UR50D

Finetuned

(17)

this model