File size: 3,570 Bytes
17528a1 67630f7 17528a1 67630f7 17528a1 67630f7 17528a1 67630f7 17528a1 67630f7 17528a1 67630f7 17528a1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | ---
license: mit
language:
- en
tags:
- molecules
- chemistry
- molecular-understanding
- graph-llm
- llama
- instruction-tuning
pipeline_tag: text-generation
base_model: unsloth/Llama-3.1-8B-Instruct
---
# EDT-Former: Full Model (Stage 2)
The full **EDT-Former** model (encoder + Llama-3.1-8B-Instruct), as described in the ICLR 2026 paper:
> **Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding**
> Zihao Jing, Qiuhao Zeng, Ruiyi Fang, Yan Sun, Boyu Wang, Pingzhao Hu
> *ICLR 2026* · [Paper](https://www.arxiv.org/abs/2602.02742) · [Code](https://github.com/selmiss/DQ-Former)
## Model Description
EDT-Former aligns molecular graphs with a frozen LLM backbone (Llama-3.1-8B-Instruct) via an entropy-guided dynamic token connector. Key properties:
- **No LLM backbone fine-tuning** (only the embedding layer and connector are trained) — computationally efficient
- **Entropy-guided dynamic token selection** preserves both local (substructural) and global molecular features
- **State-of-the-art** on MoleculeQA, Mol-Instructions (forward reaction, retrosynthesis, reagent prediction, mol design, open QA), TDC, and MoleculeNet benchmarks
This Stage 2 checkpoint (~16 GB) is the final instruction-tuned model ready for downstream molecular QA tasks.
## Usage
```python
# 1. Clone the repo and set up the environment
git clone https://github.com/selmiss/DQ-Former.git
cd DQ-Former
conda env create -f environment.yml
conda activate edtformer
# 2. Configure paths in local.env.sh
cp env.sh local.env.sh
# Edit local.env.sh: set BASE_DIR, DATA_DIR, CHECKPOINT_DIR
source local.env.sh
# 3. Download the model
from huggingface_hub import snapshot_download
snapshot_download("zihaojing/EDT-Former-model", local_dir="checkpoints/edt_former_s2_large/final_model")
# 4. Run inference (example: forward reaction prediction)
bash scripts/qa/mol_forward.sh
```
## Downstream Task Scripts
All evaluation scripts are in `scripts/qa/`. Example tasks:
| Task | Script |
|------|--------|
| Forward Reaction Prediction | `scripts/qa/mol_forward.sh` |
| Retrosynthesis | `scripts/qa/retrosynthesis.sh` |
| Reagent Prediction | `scripts/qa/reagent_prediction.sh` |
| Molecule Design | `scripts/qa/mol_design.sh` |
| Open-ended QA | `scripts/qa/open_question.sh` |
## Training Details
| Setting | Value |
|---------|-------|
| LLM backbone | Llama-3.1-8B-Instruct (frozen) |
| Stage 1 encoder | [zihaojing/EDT-Former-encoder](https://huggingface.co/zihaojing/EDT-Former-encoder) |
| Training data | [zihaojing/EDT-Former-sft-data](https://huggingface.co/datasets/zihaojing/EDT-Former-sft-data) |
| Epochs | 2 |
| Learning rate | 1e-4 (cosine) |
| Batch size | 4 × 8 grad accum = effective 32 |
| Precision | BF16 |
## Related Resources
| Resource | Link |
|----------|------|
| Pretrain Data | [zihaojing/EDT-Former-pretrain-data](https://huggingface.co/datasets/zihaojing/EDT-Former-pretrain-data) |
| SFT Data | [zihaojing/EDT-Former-sft-data](https://huggingface.co/datasets/zihaojing/EDT-Former-sft-data) |
| Encoder (Stage 1) | [zihaojing/EDT-Former-encoder](https://huggingface.co/zihaojing/EDT-Former-encoder) |
| Code | [selmiss/DQ-Former](https://github.com/selmiss/DQ-Former) |
## Citation
```bibtex
@inproceedings{jing2026edtformer,
title={Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding},
author={Jing, Zihao and Zeng, Qiuhao and Fang, Ruiyi and Sun, Yan and Wang, Boyu and Hu, Pingzhao},
booktitle={International Conference on Learning Representations (ICLR)},
year={2026}
}
```
|