Longevity Protein Classifier

Fine-tuned ESM-2 150M for binary classification of protein sequences as longevity-associated or not. Trained on multi-species GenAge data with LoRA adapters. Built to connect protein language models to longevity biology.

Performance

Metric	Value
Test AUPRC	0.335
Test AUC-ROC	0.696
Random baseline AUPRC	0.061
Improvement over random	5.5x
Best epoch	10 of 20

Benchmark Results

Protein	Score	Pass/Fail	Notes
SIRT1	0.996	PASS	NAD+ deacetylase, caloric restriction
SIRT3	0.998	PASS	Mitochondrial sirtuin
TP53	0.974	PASS	Tumour suppressor, aging roles
MYH9	0.000	PASS	Negative control — structural myosin
ACTB	0.000	PASS	Negative control — beta actin
ALB	0.000	PASS	Negative control — serum albumin
FOXO3	0.000	FAIL	Known limitation — see below
MTOR	0.000	FAIL	Known limitation — truncated at 512aa
TERT	0.000	FAIL	Known limitation — truncated at 512aa

Novel Predictions Not in GenAge

Proteins scoring above 0.50 that are not in the GenAge human database. These are model predictions only — not experimentally validated.

Protein	Score	Biological relevance
NEIL1	0.951	DNA repair of oxidative damage. DNA repair capacity correlates with species lifespan
GRHL1	0.880	Epithelial barrier maintenance. Tissue integrity declines with age
GSTA1	0.871	Glutathione S-transferase antioxidant. GST family implicated in longevity across species
TFEB	0.502	Master regulator of autophagy and lysosomal biogenesis. Overexpression extends lifespan in C. elegans. Regulated by mTOR
EXO1	0.550	DNA mismatch repair exonuclease
MSH4	0.546	DNA mismatch repair. Related family members MSH2 and MSH6 are established longevity genes

TFEB is the strongest novel prediction. It is mechanistically connected to mTOR (already in GenAge), independently predicted at 0.502 by a model trained with no pathway information.

Recommended Thresholds

Use case	Threshold
Screening — maximise recall	0.05
Balanced — default	0.06
High confidence hits only	0.50

Known Limitations

1. Protein length Sequences longer than 512 amino acids are truncated from the C-terminus. This causes failures on long proteins where the functional domain sits in the C-terminal half. MTOR (2,549 aa) and TERT (1,132 aa) both fail for this reason. Do not use this model on proteins above 800 amino acids without validating first.

2. Family-specific blind spots The model learned sirtuin and tumour suppressor sequence features well but has insufficient training examples to generalise to forkhead transcription factors. FOXO3 (402 aa, fits within 512 window) scores 0.000 despite being a canonical longevity gene. This is a training data coverage problem, not a truncation problem.

3. Direction of effect not captured The model cannot distinguish pro-longevity proteins (overexpression extends lifespan) from anti-aging-disease proteins (loss of function accelerates aging). A high score means associated with longevity biology, not activating this protein extends lifespan.

4. Not for clinical use Research screening tool only. Do not use for clinical, diagnostic, or therapeutic decisions.

How to Use

from transformers import AutoTokenizer, EsmForSequenceClassification
from peft import PeftModel
import torch

base = EsmForSequenceClassification.from_pretrained(
    "facebook/esm2_t30_150M_UR50D",
    num_labels=2,
    ignore_mismatched_sizes=True
)
model = PeftModel.from_pretrained(base, "mawe/longevity-esm2-v6")
tokenizer = AutoTokenizer.from_pretrained("mawe/longevity-esm2-v6")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()

def score_sequence(sequence, threshold=0.06):
    inputs = tokenizer(
        sequence,
        max_length=512,
        padding="max_length",
        truncation=True,
        return_tensors="pt"
    )
    with torch.no_grad():
        outputs = model(
            input_ids=inputs["input_ids"].to(device),
            attention_mask=inputs["attention_mask"].to(device)
        )
        prob = torch.softmax(outputs.logits, dim=1)[:, 1].item()
    return {
        "probability": round(prob, 4),
        "prediction": "Longevity" if prob >= threshold else "Non-longevity",
        "warning": "Sequence truncated to 512aa" if len(sequence) > 512 else None
    }

Training Details

Positive set: GenAge database

Human GenAge: 306 genes (all entries)
C. elegans Pro-Longevity: 283 genes
D. melanogaster Pro-Longevity: 125 genes
M. musculus Pro-Longevity: 85 genes
Total positives: 574

Negative set: Swiss-Prot reviewed proteins

NEG_RATIO: 10 negatives per positive
Species weights: human 2.0x, mouse 1.5x, worm and fly 1.0x
Necessary-for-fitness genes excluded from universe
Anti-Longevity genes excluded from positives

Architecture: ESM-2 150M + LoRA r=16, alpha=32, dropout=0.15

Loss: Focal loss gamma=1.0, label smoothing=0.1, contrastive margin=0.30

Optimiser: AdamW lr=2e-4, cosine schedule, 10% warmup

Hardware: NVIDIA T4 16GB on Kaggle

Experiment History

Version	Key change	Test AUPRC
v1	Frozen encoder, 186 positives	Collapsed
v2	LoRA r=8, 277 positives	0.027
v3	ESM-2 150M, multi-species, 2000 positives	0.302
v4	Pro-Longevity filter, focal loss gamma=2	0.250
v5	Cleaned species, gamma=1, label smoothing	0.323
v6	Pathway-stratified split, contrastive margin	0.335

Contact

Feedback and collaboration welcome.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mawe2/longevity-esm2-v6

Base model

facebook/esm2_t30_150M_UR50D

Finetuned

(17)

this model