| # TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation |
|
|
| TD3B is a sequence-based generative framework that designs peptide binders with specified agonist or antagonist behavior. It combines a Direction Oracle, a soft binding-affinity gate, and amortized fine-tuning of a pre-trained discrete diffusion model (MDLM). |
|
|
| ## Installation |
|
|
| ```bash |
| conda env create -f env.yml |
| conda activate td3b |
| pip install -e . |
| ``` |
|
|
| ## Data and Checkpoints |
|
|
| Download the pretrained checkpoints and data from [Google Drive (TBA)](placeholder_link). |
|
|
| Place the files as follows: |
|
|
| ``` |
| TD3B/ |
| βββ checkpoints/ |
| β βββ pretrained.ckpt # Pre-trained MDLM weights |
| β βββ td3b.ckpt # Fine-tuned TD3B model |
| β βββ direction_oracle.pt # Direction Oracle weights |
| βββ data/ |
| β βββ train.csv # Training set (target-binder pairs) |
| β βββ test.csv # Test set |
| βββ scoring/functions/classifiers/ |
| β βββ binding-affinity.pt |
| β βββ hemolysis-xgboost.json |
| β βββ nonfouling-xgboost.json |
| β βββ permeability-xgboost.json |
| β βββ solubility-xgboost.json |
| βββ tokenizer/ |
| βββ new_vocab.txt |
| βββ new_splits.txt |
| ``` |
|
|
| ## Code Structure |
|
|
| ``` |
| TD3B/ |
| βββ inference.py # Generate binders (main inference entry point) |
| βββ finetune_multi_target.py # Multi-target TD3B training |
| βββ finetune_utils.py # Training utilities |
| βββ launch_multi_target.sh # Training launcher script |
| βββ diffusion.py # MDLM backbone (TR2-D2) |
| βββ roformer.py # RoFormer wrapper |
| βββ noise_schedule.py # Noise schedules |
| βββ peptide_mcts.py # MCTS tree search |
| βββ td3b/ |
| β βββ direction_oracle.py # Direction Oracle (f_Ο) |
| β βββ td3b_scoring.py # Gated reward R = g_Ο Β· Ο(d*Β·(f_Οβ0.5)/Ο) |
| β βββ td3b_losses.py # L_WDCE + λ·L_ctr + Ξ²Β·L_KL |
| β βββ td3b_mcts.py # TD3B-extended MCTS |
| β βββ td3b_finetune.py # Training loop |
| β βββ data_utils.py # Data loading utilities |
| βββ scoring/ # Affinity predictor (g_Ο) and property classifiers |
| βββ baselines/ # CG, SMC, TDS, PepTune, Unguided baselines |
| βββ tokenizer/ # SMILES tokenizer (vocab + splits) |
| βββ configs/ # Model and training configs |
| βββ utils/ # Misc utilities |
| ``` |
|
|
| ## Inference |
|
|
| Generate agonist/antagonist binders for target proteins: |
|
|
| ```bash |
| python inference.py \ |
| --ckpt_path checkpoints/td3b.ckpt \ |
| --val_csv data/test.csv \ |
| --save_path results/ \ |
| --seed 42 \ |
| --num_pool 32 \ |
| --val_samples_per_target 8 \ |
| --resample_alpha 0.1 |
| ``` |
|
|
| This generates 32 candidates per (target, direction), scores them with the Direction Oracle and affinity predictor, applies Algorithm 2 weighted resampling, and saves only valid peptide samples. |
|
|
| Output: `results/td3b_results_seed42.csv` with columns: target, sequence, direction, affinity, gated_reward, direction_oracle, direction_accuracy. |
| |
| ## Training |
| |
| ### Multi-target TD3B |
| |
| 1. Edit `launch_multi_target.sh` β set paths to checkpoints, data, and oracle: |
| |
| ```bash |
| BASE_PATH="/path/to/TD3B" |
| PRETRAINED_CHECKPOINT="${BASE_PATH}/checkpoints/pretrained.ckpt" |
| TRAIN_CSV="${BASE_PATH}/data/train.csv" |
| ORACLE_CKPT="${BASE_PATH}/checkpoints/direction_oracle.pt" |
| ``` |
| |
| 2. Launch training: |
| |
| ```bash |
| bash launch_multi_target.sh |
| ``` |
| |
| Key hyperparameters (in `launch_multi_target.sh`): |
| - `CONTRASTIVE_WEIGHT=0.1` β Ξ» for L_ctr |
| - `KL_BETA=0.1` β Ξ² for L_KL |
| - `SIGMOID_TEMPERATURE=0.1` β Ο for gated reward |
| - `NUM_ITER=20` β MCTS iterations per round |
| - `NUM_CHILDREN=16` β Children per MCTS expansion |
|
|
| ### Baselines |
|
|
| Run baseline methods (CG, SMC, TDS, PepTune, Unguided): |
|
|
| ```bash |
| cd baselines/ |
| bash run.sh --baseline cg --device cuda:0 |
| bash run.sh --baseline smc --device cuda:0 |
| bash run.sh --baseline tds --device cuda:0 |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{caotd3b, |
| title={TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation}, |
| author={Cao, Hanqun and Pal, Aastha and Tang, Sophia and Zhang, Yinuo and Zhang, Jingjie and Heng, Pheng-Ann and Chatterjee, Pranam} |
| } |
| ``` |
|
|