--- library_name: tedbench tags: - protein - structure-sequence - fold-classification - tedbench - saprot pipeline_tag: other license: bsd-3-clause --- # TEDBench — SaProt-650M fine-tuned on TEDBench Backbone: SaProt-650M (33 layers, hidden dim 1280). Requires [Foldseek](https://github.com/steineggerlab/foldseek) for structure-aware tokens. Fine-tuned on [TEDBench](https://github.com/BorgwardtLab/TEDBench) for protein fold classification into 965 CATH topology (T-level) classes (ICML 2026). ## Usage ```python import sys sys.path.insert(0, "baselines") # from repo root from pathlib import Path import torch from models.saprot_classifier import SaProtClassifier from omegaconf import OmegaConf from huggingface_hub import snapshot_download local_dir = Path(snapshot_download("TEDBench/saprot-650M-ft")) with open(local_dir / "config.json") as f: import json cfg = OmegaConf.create(json.load(f)) model = SaProtClassifier(cfg) sd = torch.load(local_dir / "pytorch_model.bin", map_location="cpu", weights_only=False) model.load_state_dict(sd) model.eval() ``` Or pass the repo ID directly to the test script: ```bash python baselines/saprot_test_ted.py train.ckpt_path=TEDBench/saprot-650M-ft ``` ## Citation ```bibtex @inproceedings{chen2026tedbench, title={Protein Fold Classification at Scale: Benchmarking and Pretraining}, author={Chen, Dexiong and Manolache, Andrei and Niepert, Mathias and Borgwardt, Karsten}, booktitle={Proceedings of the 43rd International Conference on Machine Learning}, year={2026} } ```