TD3B / README.md
chq1155's picture
Upload TD3B code (inference, training, baselines)
ee6da62 verified
|
raw
history blame
4.44 kB
# TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation
TD3B is a sequence-based generative framework that designs peptide binders with specified agonist or antagonist behavior. It combines a Direction Oracle, a soft binding-affinity gate, and amortized fine-tuning of a pre-trained discrete diffusion model (MDLM).
## Installation
```bash
conda env create -f env.yml
conda activate td3b
pip install -e .
```
## Data and Checkpoints
Download the pretrained checkpoints and data from [Google Drive (TBA)](placeholder_link).
Place the files as follows:
```
TD3B/
β”œβ”€β”€ checkpoints/
β”‚ β”œβ”€β”€ pretrained.ckpt # Pre-trained MDLM weights
β”‚ β”œβ”€β”€ td3b.ckpt # Fine-tuned TD3B model
β”‚ └── direction_oracle.pt # Direction Oracle weights
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ train.csv # Training set (target-binder pairs)
β”‚ └── test.csv # Test set
β”œβ”€β”€ scoring/functions/classifiers/
β”‚ β”œβ”€β”€ binding-affinity.pt
β”‚ β”œβ”€β”€ hemolysis-xgboost.json
β”‚ β”œβ”€β”€ nonfouling-xgboost.json
β”‚ β”œβ”€β”€ permeability-xgboost.json
β”‚ └── solubility-xgboost.json
└── tokenizer/
β”œβ”€β”€ new_vocab.txt
└── new_splits.txt
```
## Code Structure
```
TD3B/
β”œβ”€β”€ inference.py # Generate binders (main inference entry point)
β”œβ”€β”€ finetune_multi_target.py # Multi-target TD3B training
β”œβ”€β”€ finetune_utils.py # Training utilities
β”œβ”€β”€ launch_multi_target.sh # Training launcher script
β”œβ”€β”€ diffusion.py # MDLM backbone (TR2-D2)
β”œβ”€β”€ roformer.py # RoFormer wrapper
β”œβ”€β”€ noise_schedule.py # Noise schedules
β”œβ”€β”€ peptide_mcts.py # MCTS tree search
β”œβ”€β”€ td3b/
β”‚ β”œβ”€β”€ direction_oracle.py # Direction Oracle (f_Ο†)
β”‚ β”œβ”€β”€ td3b_scoring.py # Gated reward R = g_ψ Β· Οƒ(d*Β·(f_Ο†βˆ’0.5)/Ο„)
β”‚ β”œβ”€β”€ td3b_losses.py # L_WDCE + λ·L_ctr + Ξ²Β·L_KL
β”‚ β”œβ”€β”€ td3b_mcts.py # TD3B-extended MCTS
β”‚ β”œβ”€β”€ td3b_finetune.py # Training loop
β”‚ └── data_utils.py # Data loading utilities
β”œβ”€β”€ scoring/ # Affinity predictor (g_ψ) and property classifiers
β”œβ”€β”€ baselines/ # CG, SMC, TDS, PepTune, Unguided baselines
β”œβ”€β”€ tokenizer/ # SMILES tokenizer (vocab + splits)
β”œβ”€β”€ configs/ # Model and training configs
└── utils/ # Misc utilities
```
## Inference
Generate agonist/antagonist binders for target proteins:
```bash
python inference.py \
--ckpt_path checkpoints/td3b.ckpt \
--val_csv data/test.csv \
--save_path results/ \
--seed 42 \
--num_pool 32 \
--val_samples_per_target 8 \
--resample_alpha 0.1
```
This generates 32 candidates per (target, direction), scores them with the Direction Oracle and affinity predictor, applies Algorithm 2 weighted resampling, and saves only valid peptide samples.
Output: `results/td3b_results_seed42.csv` with columns: target, sequence, direction, affinity, gated_reward, direction_oracle, direction_accuracy.
## Training
### Multi-target TD3B
1. Edit `launch_multi_target.sh` β€” set paths to checkpoints, data, and oracle:
```bash
BASE_PATH="/path/to/TD3B"
PRETRAINED_CHECKPOINT="${BASE_PATH}/checkpoints/pretrained.ckpt"
TRAIN_CSV="${BASE_PATH}/data/train.csv"
ORACLE_CKPT="${BASE_PATH}/checkpoints/direction_oracle.pt"
```
2. Launch training:
```bash
bash launch_multi_target.sh
```
Key hyperparameters (in `launch_multi_target.sh`):
- `CONTRASTIVE_WEIGHT=0.1` β€” Ξ» for L_ctr
- `KL_BETA=0.1` β€” Ξ² for L_KL
- `SIGMOID_TEMPERATURE=0.1` β€” Ο„ for gated reward
- `NUM_ITER=20` β€” MCTS iterations per round
- `NUM_CHILDREN=16` β€” Children per MCTS expansion
### Baselines
Run baseline methods (CG, SMC, TDS, PepTune, Unguided):
```bash
cd baselines/
bash run.sh --baseline cg --device cuda:0
bash run.sh --baseline smc --device cuda:0
bash run.sh --baseline tds --device cuda:0
```
## Citation
```bibtex
@article{caotd3b,
title={TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation},
author={Cao, Hanqun and Pal, Aastha and Tang, Sophia and Zhang, Yinuo and Zhang, Jingjie and Heng, Pheng-Ann and Chatterjee, Pranam}
}
```