potteryrage commited on
Commit
a0042fc
·
verified ·
1 Parent(s): efbe114

Add model card

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - synthetic-lethality
5
+ - gene-encoder
6
+ - depmap
7
+ - masked-autoencoder
8
+ - cancer-biology
9
+ language: en
10
+ datasets:
11
+ - custom
12
+ pipeline_tag: feature-extraction
13
+ ---
14
+
15
+ # SL-Predict: Frozen MAE Gene Encoder
16
+
17
+ Pretrained masked-autoencoder (MAE) gene encoder for cold-start synthetic lethality prediction from DepMap CRISPR screens.
18
+
19
+ ## Model Description
20
+
21
+ A 3-layer MLP encoder (1206 → 512 → 256 → 256) trained to reconstruct randomly masked DepMap Chronos dependency profiles (18,531 genes × 1,206 non-K562 cell lines) with MSE loss for 200 epochs.
22
+
23
+ **Key property:** This is the **leak-repaired** checkpoint — the 503-gene union of all downstream cold-start test sets was excluded from pretraining. TOST equivalence testing confirms the encoder is not load-bearing on pretrain–test gene overlap (p_max < 0.0001 at ±0.010 AUC).
24
+
25
+ ## Performance
26
+
27
+ When frozen and combined with LightGBM + confidence weighting on SynLethDB CRISPR/CRISPRi labels:
28
+
29
+ | Metric | Value |
30
+ |--------|-------|
31
+ | Horlbeck K562 held-out AUC | **0.714 ± 0.018** (10-seed, gene-disjoint) |
32
+ | vs Published SOTA (SLMGAE) | +0.079 |
33
+ | vs Label-agreement ceiling | +0.015 |
34
+
35
+ ## Usage
36
+
37
+ ```python
38
+ import torch
39
+
40
+ # Load checkpoint
41
+ ckpt = torch.load("mae_encoder_d256_leak_repaired.ckpt", map_location="cpu")
42
+ state_dict = ckpt["state_dict"]
43
+
44
+ # The encoder is the first 3 layers of the MAE
45
+ # Input: 1206-dim DepMap dependency profile (z-scored)
46
+ # Output: 256-dim gene embedding
47
+ ```
48
+
49
+ ## Training Details
50
+
51
+ - **Data:** DepMap 26Q1 Chronos dependency profiles
52
+ - **Architecture:** MLP 1206→512→256→256 (encoder), mirror decoder
53
+ - **Objective:** Masked autoencoding (50% masking ratio, MSE loss)
54
+ - **Epochs:** 200
55
+ - **Hardware:** Single NVIDIA A10G (Modal cloud), ~20 minutes
56
+ - **Leak repair:** 503 test-split genes excluded from pretraining data
57
+
58
+ ## Citation
59
+
60
+ ```
61
+ @misc{large2026slpredict,
62
+ author = {Large, Jack},
63
+ title = {Cold-start synthetic lethality prediction: Diagnosing evaluation inflation and a constructive baseline},
64
+ year = {2026},
65
+ url = {https://github.com/j8ckfi/sl-predict}
66
+ }
67
+ ```
68
+
69
+ ## Links
70
+
71
+ - **Paper:** [GitHub](https://github.com/j8ckfi/sl-predict)
72
+ - **Code:** [https://github.com/j8ckfi/sl-predict](https://github.com/j8ckfi/sl-predict)