license: apache-2.0
tags:
- chemistry
- biology
pipeline_tag: other
DISCO (DIffusion for Sequence-structure CO-design) is a multimodal generative model that simultaneously co-designs protein sequences and 3D structures, conditioned on and co-folded with arbitrary biomolecules — including small-molecule ligands, DNA, and RNA. Unlike sequential pipelines that first generate a backbone and then apply inverse folding, DISCO generates both modalities jointly, enabling sequence-based objectives to inform structure generation and vice versa.
The model was introduced in the paper General Multimodal Protein Design Enables DNA-Encoding of Chemistry.
Sample Usage
To run inference, first follow the installation instructions in the official GitHub repository. You can then run generation using the provided runner:
python runner/inference.py \
experiment=designable \
input_json_path=input_jsons/unconditional_config.json \
seeds=\[0,1,2,3,4\]
Key Parameters:
experiment: Usedesignable(steers toward samples more likely to refold correctly) ordiverse(produces greater structural variety).input_json_path: Path to the JSON file describing the generation target (masked sequences, ligands, etc.).effort: Usemaxfor full quality (200 diffusion steps, 4 recycling cycles) orfastfor prototyping.
Abstract
Evolution is an extraordinary engine for enzymatic diversity, yet the chemistry it has explored remains a narrow slice of what DNA can encode. Deep generative models can design new proteins that bind ligands, but none have created enzymes without pre-specifying catalytic residues. We introduce DISCO (DIffusion for Sequence-structure CO-design), a multimodal model that co-designs protein sequence and 3D structure around arbitrary biomolecules. Conditioned solely on reactive intermediates, DISCO designs diverse heme enzymes with novel active-site geometries that catalyze new-to-nature carbene-transfer reactions with high activities exceeding those of engineered enzymes.
Citation
@Article{disco2026,
title={General Multimodal Protein Design Enables DNA-Encoding of Chemistry},
author={Jarrid Rector-Brooks and Théophile Lambert and Marta Skreta and Daniel Roth and Yueming Long and Zi-Qi Li and Xi Zhang and Miruna Cretu and Francesca-Zhoufan Li and Tanvi Ganapathy and Emily Jin and Avishek Joey Bose and Jason Yang and Kirill Neklyudov and Yoshua Bengio and Alexander Tong and Frances H. Arnold and Cheng-Hao Liu},
year={2026},
eprint={2604.05181},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2604.05181},
}