EDT-Former: Full Model (Stage 2)

The full EDT-Former model (encoder + Llama-3.1-8B-Instruct), as described in the ICLR 2026 paper:

Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding Zihao Jing, Qiuhao Zeng, Ruiyi Fang, Yan Sun, Boyu Wang, Pingzhao Hu ICLR 2026 · Paper · Code

Model Description

EDT-Former aligns molecular graphs with a frozen LLM backbone (Llama-3.1-8B-Instruct) via an entropy-guided dynamic token connector. Key properties:

No LLM backbone fine-tuning (only the embedding layer and connector are trained) — computationally efficient
Entropy-guided dynamic token selection preserves both local (substructural) and global molecular features
State-of-the-art on MoleculeQA, Mol-Instructions (forward reaction, retrosynthesis, reagent prediction, mol design, open QA), TDC, and MoleculeNet benchmarks

This Stage 2 checkpoint (~16 GB) is the final instruction-tuned model ready for downstream molecular QA tasks.

Usage

# 1. Clone the repo and set up the environment
git clone https://github.com/selmiss/DQ-Former.git
cd DQ-Former
conda env create -f environment.yml
conda activate edtformer

# 2. Configure paths in local.env.sh
cp env.sh local.env.sh
# Edit local.env.sh: set BASE_DIR, DATA_DIR, CHECKPOINT_DIR
source local.env.sh

# 3. Download the model
from huggingface_hub import snapshot_download
snapshot_download("zihaojing/EDT-Former-model", local_dir="checkpoints/edt_former_s2_large/final_model")

# 4. Run inference (example: forward reaction prediction)
bash scripts/qa/mol_forward.sh

Downstream Task Scripts

All evaluation scripts are in scripts/qa/. Example tasks:

Task	Script
Forward Reaction Prediction	`scripts/qa/mol_forward.sh`
Retrosynthesis	`scripts/qa/retrosynthesis.sh`
Reagent Prediction	`scripts/qa/reagent_prediction.sh`
Molecule Design	`scripts/qa/mol_design.sh`
Open-ended QA	`scripts/qa/open_question.sh`

Training Details

Setting	Value
LLM backbone	Llama-3.1-8B-Instruct (frozen)
Stage 1 encoder	zihaojing/EDT-Former-encoder
Training data	zihaojing/EDT-Former-sft-data
Epochs	2
Learning rate	1e-4 (cosine)
Batch size	4 × 8 grad accum = effective 32
Precision	BF16

Related Resources

Resource	Link
Pretrain Data	zihaojing/EDT-Former-pretrain-data
SFT Data	zihaojing/EDT-Former-sft-data
Encoder (Stage 1)	zihaojing/EDT-Former-encoder
Code	selmiss/DQ-Former

Citation

@inproceedings{jing2026edtformer,
  title={Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding},
  author={Jing, Zihao and Zeng, Qiuhao and Fang, Ruiyi and Sun, Yan and Wang, Boyu and Hu, Pingzhao},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026}
}