File size: 2,454 Bytes
0ad1f1a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
datasets:
- TCGA
library_name: pytorch
license: mit
pipeline_tag: other
tags:
- medical-imaging
- computational-pathology
- survival-analysis
- multimodal
- tcga
- interpretable
---

# ProtoPathway

Pretrained checkpoints, preprocessed cohort data, and the curated pathway graph for **ProtoPathway**, an interpretable-by-design multimodal framework for cancer survival prediction.

- **Paper:** [ProtoPathway: Biologically Structured Prototype-Pathway Fusion for Multimodal Cancer Survival Prediction](https://huggingface.co/papers/2605.21454)
- **Code:** [https://github.com/AmayaGS/ProtoPathway](https://github.com/AmayaGS/ProtoPathway)

## Layout

```
pathways/pathways_base_*.pkl           curated Reactome + Hallmark pathway graph
raw_inputs/                            raw files for re-running preprocessing from scratch
    Reactome/                          Reactome hierarchy files (GMT, relations, names)
    Hallmark/                          MSigDB Hallmark gene sets
    {cohort}/                          rna_clean.csv, clinical CSV, SurvPath splits
cohorts/{cohort}/                      preprocessed cohort data and trained models
    gene_expression.csv                preprocessed expression matrix
    bipartite_graph.pt                 cohort-specific gene-pathway graph
    labels.csv                         survival times, events, and bins
    data_splits.pkl                    5-fold CV splits (SurvPath-compatible)
    checkpoints/best_fold_{0..4}.pt    trained model weights
```

## Cohorts

Five TCGA cohorts: BRCA (N=714), BLCA (N=359), COADREAD (N=227),
HNSC (N=392), STAD (N=318). Gene expression is the preprocessed
SurvPath release. WSI patch features (UNI2-h) are not redistributed
here and should be obtained from the
[Mahmood Lab](https://huggingface.co/MahmoodLab/UNI2-h) directly.

## Quick load

```python
from huggingface_hub import snapshot_download

# Everything for one cohort plus the shared pathway file
snapshot_download(
    repo_id="AmayaGS/ProtoPathway",
    local_dir="./assets",
    allow_patterns=["cohorts/TCGA-BLCA/*", "pathways/*"],
)
```

## Citation

```bibtex
@article{protopathway2026,
  title   = {ProtoPathway: Biologically Structured Prototype-Pathway Fusion for Multimodal Cancer Survival Prediction},
  author  = {Amaya Gallagher-Syed, Costantino Pitzalis, Myles J. Lewis, Michael R. 
  Barnes, Gregory Slabaugh},
  journal = {arXiv preprint arXiv:2605.21454},
  year    = {2026},
}
```