lhallee commited on
Commit
b2f6acb
·
verified ·
1 Parent(s): 84a1e2b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +127 -127
README.md CHANGED
@@ -1,127 +1,127 @@
1
- ---
2
- library_name: transformers
3
- tags:
4
- - protein language model
5
- - biology
6
- ---
7
-
8
- # FastANKH
9
-
10
- Fast, optimized implementations of ANKH protein language models (T5-based) with multi-backend attention support.
11
-
12
- **Requires PyTorch 2.11+** for Flash Attention 4 (FA4) backend support via flex attention.
13
-
14
- ## Models
15
-
16
- | Model | Params | Layers | Hidden | Heads | Activation | Source |
17
- |-------|--------|--------|--------|-------|------------|--------|
18
- | ANKH_base | 453.3M | 48 | 768 | 12 | gelu_new | ElnaggarLab/ankh-base |
19
- | ANKH_large | 1.15B | 48 | 1536 | 16 | gelu_new | ElnaggarLab/ankh-large |
20
- | ANKH2_large | 1.15B | 24 | 1536 | 16 | silu | ElnaggarLab/ankh2-ext2 |
21
- | ANKH3_large | 1.15B | 48 | 1536 | 16 | silu | ElnaggarLab/ankh3-large |
22
- | ANKH3_xl | 3.49B | 48 | 2560 | 32 | silu | ElnaggarLab/ankh3-xl |
23
-
24
- ## Usage
25
-
26
- ```python
27
- from transformers import AutoModel, AutoTokenizer
28
-
29
- model = AutoModel.from_pretrained("Synthyra/ANKH_base", trust_remote_code=True)
30
- tokenizer = AutoTokenizer.from_pretrained("Synthyra/ANKH_base")
31
-
32
- # Set attention backend before loading for best performance
33
- # Options: "sdpa" (default, exact), "flex" (FA4 on Hopper/Blackwell)
34
- model.config.attn_backend = "flex"
35
-
36
- sequences = ["MKTLLILAVL", "ACDEFGHIKLMNPQRSTVWY"]
37
- inputs = tokenizer(sequences, return_tensors="pt", padding=True)
38
- outputs = model(**inputs)
39
-
40
- # Per-residue embeddings
41
- embeddings = outputs.last_hidden_state # (batch, seq_len, hidden_dim)
42
- ```
43
-
44
- ## Batch Embedding
45
-
46
- ```python
47
- model = AutoModel.from_pretrained("Synthyra/ANKH_base", trust_remote_code=True).to("cuda")
48
- embeddings = model.embed_dataset(
49
- sequences=["MKTLLILAVL", "ACDEFGHIKLMNPQRSTVWY"],
50
- batch_size=8,
51
- max_len=512,
52
- full_embeddings=True,
53
- )
54
- ```
55
-
56
- ## Attention Backends
57
-
58
- | Backend | Key | Notes |
59
- |---------|-----|-------|
60
- | SDPA | `"sdpa"` | Default. Exact attention with position bias as additive mask. |
61
- | Flex | `"flex"` | Uses FA4 on Hopper/Blackwell GPUs (PyTorch 2.11+). Position bias computed via `score_mod`. Triton fallback on older hardware. |
62
- | Flash | `"kernels_flash"` | Not supported for ANKH (no arbitrary bias support). Falls back to flex/sdpa. |
63
-
64
- ## Architecture
65
-
66
- ANKH models are T5 encoder-only architectures:
67
- - **No absolute position embeddings**: Uses T5-style relative position bias (log-bucketed, bidirectional)
68
- - **RMS LayerNorm**: No mean subtraction, no bias term
69
- - **Gated FFN**: `activation(wi_0(x)) * wi_1(x) -> wo(x)` with gelu_new (v1) or silu (v2/v3)
70
- - **Pre-layer normalization**: Norm before attention and FFN, residual after
71
- - **No bias in projections**: All q/k/v/o and FFN linear layers are bias=False
72
-
73
- The relative position bias is computed once (materialized as a full tensor) and shared across all encoder layers. For the flex backend, the bias is passed as a `score_mod` closure for optimal performance.
74
-
75
- ## Notes
76
-
77
- - The `FastAnkhForMaskedLM` variant includes an LM head initialized from the shared embedding weights. The original ANKH models were trained with T5's span corruption objective using an encoder-decoder architecture. This encoder-only MaskedLM head is **not pre-trained for standard MLM** and requires additional fine-tuning.
78
- - ANKH3 models use a vocabulary of 256 tokens (vs 144 for v1/v2) and were trained with dual objectives ([NLU] for embeddings, [S2S] for generation).
79
-
80
- ## Citations
81
-
82
- ```bibtex
83
- @article{elnaggar2023ankh,
84
- title={Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling},
85
- author={Elnaggar, Ahmed and Essam, Hazem and Salah-Eldin, Wafaa and Moustafa, Walid and Elkerdawy, Mohamed and Rochereau, Charlotte and Rost, Burkhard},
86
- journal={arXiv preprint arXiv:2301.06568},
87
- year={2023}
88
- }
89
- ```
90
-
91
- ```bibtex
92
- @article{alsamkary2025ankh3,
93
- title={Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations},
94
- author={Alsamkary, Hazem and Elshaffei, Mohamed and Elkerdawy, Mohamed and Elnaggar, Ahmed},
95
- journal={arXiv preprint arXiv:2505.20052},
96
- year={2025}
97
- }
98
- ```
99
-
100
- ```bibtex
101
- @misc{FastPLMs,
102
- author={Hallee, Logan and Bichara, David and Gleghorn, Jason P.},
103
- title={FastPLMs: Fast, efficient, protein language model inference from Huggingface AutoModel.},
104
- year={2024},
105
- url={https://huggingface.co/Synthyra/ESMplusplus_small},
106
- DOI={10.57967/hf/3726},
107
- publisher={Hugging Face}
108
- }
109
- ```
110
-
111
- ```bibtex
112
- @article{dong2024flexattention,
113
- title={Flex Attention: A Programming Model for Generating Optimized Attention Kernels},
114
- author={Dong, Juechu and Feng, Boyuan and Guessous, Driss and Liang, Yanbo and He, Horace},
115
- journal={arXiv preprint arXiv:2412.05496},
116
- year={2024}
117
- }
118
- ```
119
-
120
- ```bibtex
121
- @inproceedings{paszke2019pytorch,
122
- title={PyTorch: An Imperative Style, High-Performance Deep Learning Library},
123
- author={Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and K{\"o}pf, Andreas and Yang, Edward and DeVito, Zach and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
124
- booktitle={Advances in Neural Information Processing Systems 32},
125
- year={2019}
126
- }
127
- ```
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - protein language model
5
+ - biology
6
+ ---
7
+
8
+ # FastANKH
9
+
10
+ Fast, optimized implementations of ANKH protein language models (T5-based) with multi-backend attention support.
11
+
12
+ **Requires PyTorch 2.11+** for Flash Attention 4 (FA4) backend support via flex attention.
13
+
14
+ ## Models
15
+
16
+ | Model | Params | Layers | Hidden | Heads | Activation | Source |
17
+ |-------|--------|--------|--------|-------|------------|--------|
18
+ | ANKH_base | 453.3M | 48 | 768 | 12 | gelu_new | ElnaggarLab/ankh-base |
19
+ | ANKH_large | 1.15B | 48 | 1536 | 16 | gelu_new | ElnaggarLab/ankh-large |
20
+ | ANKH2_large | 1.15B | 24 | 1536 | 16 | silu | ElnaggarLab/ankh2-ext2 |
21
+ | ANKH3_large | 1.15B | 48 | 1536 | 16 | silu | ElnaggarLab/ankh3-large |
22
+ | ANKH3_xl | 3.49B | 48 | 2560 | 32 | silu | ElnaggarLab/ankh3-xl |
23
+
24
+ ## Usage
25
+
26
+ ```python
27
+ from transformers import AutoModel, AutoTokenizer
28
+
29
+ model = AutoModel.from_pretrained("Synthyra/ANKH_base", trust_remote_code=True)
30
+ tokenizer = AutoTokenizer.from_pretrained("Synthyra/ANKH_base")
31
+
32
+ # Set attention backend before loading for best performance
33
+ # Options: "sdpa" (default, exact), "flex" (FA4 on Hopper/Blackwell)
34
+ model.config.attn_backend = "flex"
35
+
36
+ sequences = ["MKTLLILAVL", "ACDEFGHIKLMNPQRSTVWY"]
37
+ inputs = tokenizer(sequences, return_tensors="pt", padding=True)
38
+ outputs = model(**inputs)
39
+
40
+ # Per-residue embeddings
41
+ embeddings = outputs.last_hidden_state # (batch, seq_len, hidden_dim)
42
+ ```
43
+
44
+ ## Batch Embedding
45
+
46
+ ```python
47
+ model = AutoModel.from_pretrained("Synthyra/ANKH_base", trust_remote_code=True).to("cuda")
48
+ embeddings = model.embed_dataset(
49
+ sequences=["MKTLLILAVL", "ACDEFGHIKLMNPQRSTVWY"],
50
+ batch_size=8,
51
+ max_len=512,
52
+ full_embeddings=True,
53
+ )
54
+ ```
55
+
56
+ ## Attention Backends
57
+
58
+ | Backend | Key | Notes |
59
+ |---------|-----|-------|
60
+ | SDPA | `"sdpa"` | Default. Exact attention with position bias as additive mask. |
61
+ | Flex | `"flex"` | Uses FA4 on Hopper/Blackwell GPUs (PyTorch 2.11+). Position bias computed via `score_mod`. Triton fallback on older hardware. |
62
+ | Flash | `"kernels_flash"` | Not supported for ANKH (no arbitrary bias support). Falls back to flex/sdpa. |
63
+
64
+ ## Architecture
65
+
66
+ ANKH models are T5 encoder-only architectures:
67
+ - **No absolute position embeddings**: Uses T5-style relative position bias (log-bucketed, bidirectional)
68
+ - **RMS LayerNorm**: No mean subtraction, no bias term
69
+ - **Gated FFN**: `activation(wi_0(x)) * wi_1(x) -> wo(x)` with gelu_new (v1) or silu (v2/v3)
70
+ - **Pre-layer normalization**: Norm before attention and FFN, residual after
71
+ - **No bias in projections**: All q/k/v/o and FFN linear layers are bias=False
72
+
73
+ The relative position bias is computed once (materialized as a full tensor) and shared across all encoder layers. For the flex backend, the bias is passed as a `score_mod` closure for optimal performance.
74
+
75
+ ## Notes
76
+
77
+ - The `FastAnkhForMaskedLM` variant includes an LM head initialized from the shared embedding weights. The original ANKH models were trained with T5's span corruption objective using an encoder-decoder architecture. This encoder-only MaskedLM head is **not pre-trained for standard MLM** and requires additional fine-tuning.
78
+ - ANKH3 models use a vocabulary of 256 tokens (vs 144 for v1/v2) and were trained with dual objectives ([NLU] for embeddings, [S2S] for generation).
79
+
80
+ ## Citations
81
+
82
+ ```bibtex
83
+ @article{elnaggar2023ankh,
84
+ title={Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling},
85
+ author={Elnaggar, Ahmed and Essam, Hazem and Salah-Eldin, Wafaa and Moustafa, Walid and Elkerdawy, Mohamed and Rochereau, Charlotte and Rost, Burkhard},
86
+ journal={arXiv preprint arXiv:2301.06568},
87
+ year={2023}
88
+ }
89
+ ```
90
+
91
+ ```bibtex
92
+ @article{alsamkary2025ankh3,
93
+ title={Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations},
94
+ author={Alsamkary, Hazem and Elshaffei, Mohamed and Elkerdawy, Mohamed and Elnaggar, Ahmed},
95
+ journal={arXiv preprint arXiv:2505.20052},
96
+ year={2025}
97
+ }
98
+ ```
99
+
100
+ ```bibtex
101
+ @misc{FastPLMs,
102
+ author={Hallee, Logan and Bichara, David and Gleghorn, Jason P.},
103
+ title={FastPLMs: Fast, efficient, protein language model inference from Huggingface AutoModel.},
104
+ year={2024},
105
+ url={https://huggingface.co/Synthyra/ESMplusplus_small},
106
+ DOI={10.57967/hf/3726},
107
+ publisher={Hugging Face}
108
+ }
109
+ ```
110
+
111
+ ```bibtex
112
+ @article{dong2024flexattention,
113
+ title={Flex Attention: A Programming Model for Generating Optimized Attention Kernels},
114
+ author={Dong, Juechu and Feng, Boyuan and Guessous, Driss and Liang, Yanbo and He, Horace},
115
+ journal={arXiv preprint arXiv:2412.05496},
116
+ year={2024}
117
+ }
118
+ ```
119
+
120
+ ```bibtex
121
+ @inproceedings{paszke2019pytorch,
122
+ title={PyTorch: An Imperative Style, High-Performance Deep Learning Library},
123
+ author={Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and K{\"o}pf, Andreas and Yang, Edward and DeVito, Zach and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
124
+ booktitle={Advances in Neural Information Processing Systems 32},
125
+ year={2019}
126
+ }
127
+ ```