AliSaadatV commited on
Commit
28bbe23
·
verified ·
1 Parent(s): c451d12

Add protein_aggregator package and example

Browse files
Files changed (1) hide show
  1. protein_aggregator/__init__.py +38 -0
protein_aggregator/__init__.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Protein Sequence-Level Prediction with Multiple Token Aggregation Methods.
3
+
4
+ Extracts residue embeddings from ESM2 (frozen backbone) and performs
5
+ sequence-level prediction (e.g., localization) using 6 aggregation strategies:
6
+
7
+ 1. Mean pooling
8
+ 2. Max pooling
9
+ 3. CLS token
10
+ 4. GLOT (cosine-similarity token graph)
11
+ 5. GLOT-Residue (protein residue contact graph via graphein)
12
+ 6. Covariance pooling
13
+
14
+ Reference:
15
+ - GLOT: "Towards Improved Sentence Representations using Token Graphs" (arXiv:2603.03389)
16
+ - Covariance Pooling: https://www.goodfire.ai/research/covariance-pooling
17
+ - Graphein: https://graphein.ai/
18
+ """
19
+
20
+ from .model import ProteinSequenceClassifier
21
+ from .aggregators import (
22
+ MeanPooling,
23
+ MaxPooling,
24
+ CLSPooling,
25
+ GLOTPooling,
26
+ GLOTResidueGraphPooling,
27
+ CovariancePooling,
28
+ )
29
+
30
+ __all__ = [
31
+ "ProteinSequenceClassifier",
32
+ "MeanPooling",
33
+ "MaxPooling",
34
+ "CLSPooling",
35
+ "GLOTPooling",
36
+ "GLOTResidueGraphPooling",
37
+ "CovariancePooling",
38
+ ]