--- pipeline_tag: text-generation ---

TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation

Paper arXiv
![Screenshot 2026-05-10 at 6.52.29 PM](https://cdn-uploads.huggingface.co/production/uploads/64cd5b3f0494187a9e8b7c69/38-6PQ83pPraF7KAs3g3E.png) TD3B is a sequence-based generative framework that designs peptide binders with specified agonist or antagonist behavior. It combines a target-aware Direction Oracle, a soft binding-affinity gate, and amortized fine-tuning of a pre-trained discrete diffusion model (MDLM). The model was presented in [TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation](https://huggingface.co/papers/2605.09810). ## Installation ```bash conda env create -f env.yml conda activate td3b pip install -e . ``` ## Data and Checkpoints Download the pretrained checkpoints and data from [Google Drive (TBA)](placeholder_link). Place the files as follows: ``` TD3B/ ├── checkpoints/ │ ├── pretrained.ckpt # Pre-trained MDLM weights │ ├── td3b.ckpt # Fine-tuned TD3B model │ └── direction_oracle.pt # Direction Oracle weights ├── data/ │ ├── train.csv # Training set (target-binder pairs) │ └── test.csv # Test set ├── scoring/functions/classifiers/ │ ├── binding-affinity.pt │ ├── hemolysis-xgboost.json │ ├── nonfouling-xgboost.json │ ├── permeability-xgboost.json │ └── solubility-xgboost.json └── tokenizer/ ├── new_vocab.txt └── new_splits.txt ``` ## Code Structure ``` TD3B/ ├── inference.py # Generate binders (main inference entry point) ├── finetune_multi_target.py # Multi-target TD3B training ├── finetune_utils.py # Training utilities ├── launch_multi_target.sh # Training launcher script ├── diffusion.py # MDLM backbone (TR2-D2) ├── roformer.py # RoFormer wrapper ├── noise_schedule.py # Noise schedules ├── peptide_mcts.py # MCTS tree search ├── td3b/ │ ├── direction_oracle.py # Direction Oracle (f_φ) │ ├── td3b_scoring.py # Gated reward R = g_ψ · σ(d*·(f_φ−0.5)/τ) │ ├── td3b_losses.py # L_WDCE + λ·L_ctr + β·L_KL │ ├── td3b_mcts.py # TD3B-extended MCTS │ ├── td3b_finetune.py # Training loop │ └── data_utils.py # Data loading utilities ├── scoring/ # Affinity predictor (g_ψ) and property classifiers ├── baselines/ # CG, SMC, TDS, PepTune, Unguided baselines ├── tokenizer/ # SMILES tokenizer (vocab + splits) ├── configs/ # Model and training configs └── utils/ # Misc utilities ``` ## Inference Generate agonist/antagonist binders for target proteins: ```bash python inference.py \ --ckpt_path checkpoints/td3b.ckpt \ --val_csv data/test.csv \ --save_path results/ \ --seed 42 \ --num_pool 32 \ --val_samples_per_target 8 \ --resample_alpha 0.1 ``` This generates 32 candidates per (target, direction), scores them with the Direction Oracle and affinity predictor, applies Algorithm 2 weighted resampling, and saves only valid peptide samples. Output: `results/td3b_results_seed42.csv` with columns: target, sequence, direction, affinity, gated_reward, direction_oracle, direction_accuracy. ## Training ### Multi-target TD3B 1. Edit `launch_multi_target.sh` — set paths to checkpoints, data, and oracle: ```bash BASE_PATH="/path/to/TD3B" PRETRAINED_CHECKPOINT="${BASE_PATH}/checkpoints/pretrained.ckpt" TRAIN_CSV="${BASE_PATH}/data/train.csv" ORACLE_CKPT="${BASE_PATH}/checkpoints/direction_oracle.pt" ``` 2. Launch training: ```bash bash launch_multi_target.sh ``` Key hyperparameters (in `launch_multi_target.sh`): - `CONTRASTIVE_WEIGHT=0.1` — λ for L_ctr - `KL_BETA=0.1` — β for L_KL - `SIGMOID_TEMPERATURE=0.1` — τ for gated reward - `NUM_ITER=20` — MCTS iterations per round - `NUM_CHILDREN=16` — Children per MCTS expansion ### Baselines Run baseline methods (CG, SMC, TDS, PepTune, Unguided): ```bash cd baselines/ bash run.sh --baseline cg --device cuda:0 bash run.sh --baseline smc --device cuda:0 bash run.sh --baseline tds --device cuda:0 ``` ## Citation ```bibtex @inproceedings{ cao2026td3b, title={TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation}, author={Hanqun Cao and Aastha Pal and Sophia Tang and Yinuo Zhang and Jingjie Zhang and Pheng Ann Heng and Pranam Chatterjee}, booktitle={Forty-third International Conference on Machine Learning}, year={2026}, url={https://huggingface.co/papers/2605.09810} } ```