YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation

arXiv

Screenshot 2026-05-10 at 6.52.29β€―PM

TD3B is a sequence-based generative framework that designs peptide binders with specified agonist or antagonist behavior. It combines a Direction Oracle, a soft binding-affinity gate, and amortized fine-tuning of a pre-trained discrete diffusion model (MDLM).

Installation

conda env create -f env.yml
conda activate td3b
pip install -e .

Data and Checkpoints

Download the pretrained checkpoints and data from Google Drive (TBA).

Place the files as follows:

TD3B/
β”œβ”€β”€ checkpoints/
β”‚   β”œβ”€β”€ pretrained.ckpt          # Pre-trained MDLM weights
β”‚   β”œβ”€β”€ td3b.ckpt                # Fine-tuned TD3B model
β”‚   └── direction_oracle.pt      # Direction Oracle weights
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ train.csv                # Training set (target-binder pairs)
β”‚   └── test.csv                 # Test set
β”œβ”€β”€ scoring/functions/classifiers/
β”‚   β”œβ”€β”€ binding-affinity.pt
β”‚   β”œβ”€β”€ hemolysis-xgboost.json
β”‚   β”œβ”€β”€ nonfouling-xgboost.json
β”‚   β”œβ”€β”€ permeability-xgboost.json
β”‚   └── solubility-xgboost.json
└── tokenizer/
    β”œβ”€β”€ new_vocab.txt
    └── new_splits.txt

Code Structure

TD3B/
β”œβ”€β”€ inference.py                 # Generate binders (main inference entry point)
β”œβ”€β”€ finetune_multi_target.py     # Multi-target TD3B training
β”œβ”€β”€ finetune_utils.py            # Training utilities
β”œβ”€β”€ launch_multi_target.sh       # Training launcher script
β”œβ”€β”€ diffusion.py                 # MDLM backbone (TR2-D2)
β”œβ”€β”€ roformer.py                  # RoFormer wrapper
β”œβ”€β”€ noise_schedule.py            # Noise schedules
β”œβ”€β”€ peptide_mcts.py              # MCTS tree search
β”œβ”€β”€ td3b/
β”‚   β”œβ”€β”€ direction_oracle.py      # Direction Oracle (f_Ο†)
β”‚   β”œβ”€β”€ td3b_scoring.py          # Gated reward R = g_ψ Β· Οƒ(d*Β·(f_Ο†βˆ’0.5)/Ο„)
β”‚   β”œβ”€β”€ td3b_losses.py           # L_WDCE + λ·L_ctr + Ξ²Β·L_KL
β”‚   β”œβ”€β”€ td3b_mcts.py             # TD3B-extended MCTS
β”‚   β”œβ”€β”€ td3b_finetune.py         # Training loop
β”‚   └── data_utils.py            # Data loading utilities
β”œβ”€β”€ scoring/                     # Affinity predictor (g_ψ) and property classifiers
β”œβ”€β”€ baselines/                   # CG, SMC, TDS, PepTune, Unguided baselines
β”œβ”€β”€ tokenizer/                   # SMILES tokenizer (vocab + splits)
β”œβ”€β”€ configs/                     # Model and training configs
└── utils/                       # Misc utilities

Inference

Generate agonist/antagonist binders for target proteins:

python inference.py \
    --ckpt_path checkpoints/td3b.ckpt \
    --val_csv data/test.csv \
    --save_path results/ \
    --seed 42 \
    --num_pool 32 \
    --val_samples_per_target 8 \
    --resample_alpha 0.1

This generates 32 candidates per (target, direction), scores them with the Direction Oracle and affinity predictor, applies Algorithm 2 weighted resampling, and saves only valid peptide samples.

Output: results/td3b_results_seed42.csv with columns: target, sequence, direction, affinity, gated_reward, direction_oracle, direction_accuracy.

Training

Multi-target TD3B

  1. Edit launch_multi_target.sh β€” set paths to checkpoints, data, and oracle:
BASE_PATH="/path/to/TD3B"
PRETRAINED_CHECKPOINT="${BASE_PATH}/checkpoints/pretrained.ckpt"
TRAIN_CSV="${BASE_PATH}/data/train.csv"
ORACLE_CKPT="${BASE_PATH}/checkpoints/direction_oracle.pt"
  1. Launch training:
bash launch_multi_target.sh

Key hyperparameters (in launch_multi_target.sh):

  • CONTRASTIVE_WEIGHT=0.1 β€” Ξ» for L_ctr
  • KL_BETA=0.1 β€” Ξ² for L_KL
  • SIGMOID_TEMPERATURE=0.1 β€” Ο„ for gated reward
  • NUM_ITER=20 β€” MCTS iterations per round
  • NUM_CHILDREN=16 β€” Children per MCTS expansion

Baselines

Run baseline methods (CG, SMC, TDS, PepTune, Unguided):

cd baselines/
bash run.sh --baseline cg --device cuda:0
bash run.sh --baseline smc --device cuda:0
bash run.sh --baseline tds --device cuda:0

Citation

@inproceedings{
  cao2026td3b,
  title={TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation},
  author={Anonymous},
  booktitle={Forty-third International Conference on Machine Learning},
  year={2026},
  url={https://openreview.net/forum?id=RNuC8Nj6rD}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for ChatterjeeLab/TD3B