Evolla-10B-hf

A frontier protein-language generative model — because proteins deserve better small talk.

Model Description

Evolla is an advanced 80-billion-parameter (with 10B variants) protein-language generative model designed to decode the molecular language of proteins. It integrates information from protein sequences, structures, and user queries to generate precise and contextually nuanced insights into protein function.

This specific repository contains the 10B parameter model, trained with Causal Protein-Language Modeling (CPLM).

Note: This set of model parameters is specifically formatted for the 🤗 Transformers library! If you want to use Evolla with our original custom repository, please check Evolla-10B or Evolla-10B-DPO.

Usage with 🤗 Transformers

You can load and use Evolla-10B-hf directly using the standard Hugging Face API. Please ensure that your aa_seq and foldseek sequences have the exact same length.

import torch
from transformers import EvollaProcessor, EvollaForProteinText2Text

model_id = "westlake-repl/Evolla-10B-hf"

# Load processor and model
processor = EvollaProcessor.from_pretrained(model_id)
model = EvollaForProteinText2Text.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
).eval()

# 1. Prepare protein structural information
# Note: aa_seq should have the same length as foldseek. 
# Use '#' for low-confidence foldseek tokens.
protein_inputs = [
    {
        "aa_seq": "MATGGRRG...",
        "foldseek": "###lqpfd..." 
    }
]

# 2. Prepare chat messages
messages_list = [
    [
        {"role": "system", "content": "You are an AI expert that can answer any questions about protein."},
        {"role": "user", "content": "What is the function of this protein?"}
    ]
]

# 3. Process inputs
inputs = processor(
    proteins=protein_inputs, 
    messages_list=messages_list, 
    return_tensors="pt", 
    text_max_length=512, 
    protein_max_length=1024
).to(model.device)

# 4. Generate response
with torch.no_grad():
    generated_ids = model.generate(**inputs, max_new_tokens=256)

generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(generated_texts)

Official Support in 🤗 Transformers

Evolla has been officially integrated into the 🤗 Transformers library! You no longer need to use a custom fork or specific branches to run this model.

To use Evolla, simply ensure you have the latest version of transformers installed:

pip install --upgrade transformers

For detailed API references, advanced configurations, and more examples, please check out the Official Evolla Documentation in Transformers.

Citation

If you find Evolla useful in your research, please cite our paper:

@article{zhou2025decoding,
  title={Decoding the molecular language of proteins with evolla},
  author={Zhou, Xibin and Han, Chenchen and Zhang, Yingqi and Du, Huan and Tian, Jiayuan and Su, Jin and Liu, Renju and Zhuang, Kai and Jiang, Shiyu and Gitter, Anthony and others},
  journal={bioRxiv},
  pages={2025--01},
  year={2025},
  publisher={Cold Spring Harbor Laboratory}
}

Downloads last month: 189,617

Safetensors

Model size

10B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for westlake-repl/Evolla-10B-hf

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

(2584)

this model