PsychGNN Masked-Edge Imputation Model
Model summary
This repository contains a heterogeneous graph neural network trained to recover masked SNP-disorder links within a psychiatric cross-disorder graph.
The model is intended for variant-level research use. It does not predict patient-level diagnosis, prognosis, or treatment response.
Intended task
The training and evaluation task is disorder-conditional masked-edge imputation:
- start from a graph containing SNP, gene, and disorder nodes
- remove a subset of observed SNP-disorder links from message passing
- retain the same SNP's remaining cross-disorder context where available
- predict whether the hidden SNP-disorder link should exist
This task answers a specific question:
given a psychiatric variant already represented in the graph, can the model recover missing cross-disorder links from the rest of the graph structure?
Data provenance
The checkpoint was trained on:
- harmonized dataset: lighteternal/pgc-psychiatric-gwas-harmonized
- graph artifact: lighteternal/psychgnn-psychiatric-graph
The harmonized dataset was derived from public OpenMed / PGC Hugging Face repositories, including:
- OpenMed/pgc-adhd
- OpenMed/pgc-anxiety
- OpenMed/pgc-autism
- OpenMed/pgc-bipolar
- OpenMed/pgc-eating-disorders
- OpenMed/pgc-mdd
- OpenMed/pgc-ocd-tourette
- OpenMed/pgc-other
- OpenMed/pgc-ptsd
- OpenMed/pgc-schizophrenia
- OpenMed/pgc-substance-use
Scope
The underlying graph contains 11 modeled disorder groups:
- ADHD
- Anxiety
- Autism
- Bipolar disorder
- Borderline personality disorder
- Eating disorders
- Major depressive disorder
- Obsessive-compulsive disorder
- Post-traumatic stress disorder
- Schizophrenia
- Substance use
At inference time the checkpoint can score any of these disorders for variants present in the graph artifact.
Architecture
The model is a custom heterogeneous GraphSAGE-style network over three node types:
- SNP nodes
- gene nodes
- disorder nodes
Message passing uses:
- SNP self-updates
- SNP-to-gene aggregation
- gene self-updates
- gene-to-SNP aggregation
- disorder self-updates
- SNP-to-disorder aggregation
This release uses no disorder-disorder edges.
Decoder heads
The checkpoint contains two task heads:
- a bilinear link decoder for SNP-disorder link scoring
- an effect-size regression head for predicting a normalized effect estimate on positive edges
Training configuration
Best hyperparameters:
- hidden dimension:
192 - layers:
3 - dropout:
0.2 - learning rate:
7.5e-4 - weight decay:
1e-5 - negative sampling ratio:
1
Checkpoint metadata:
- SNP feature dimension:
7 - gene feature dimension:
4 - disorder feature dimension:
5 - effect normalization mean:
-0.001849 - effect normalization std:
0.094672
Graph context
Graph metadata for this release:
- variants:
18,979 - genes:
1,205 - disorders:
11 - SNP-disorder edges:
22,687 - SNP-gene edges:
65,634 - disorder-disorder edges:
0 - GWS threshold for graph edge construction:
5e-8 - SNP-gene positional window:
100,000 bp
Evaluation
Primary masked-edge benchmark:
- test AUROC:
0.9805 - test AP:
0.9818 - effect-size Pearson r:
0.9379 - best validation AUROC:
0.9759
Per-disorder benchmark coverage:
| Disorder | AUROC | AP | Test edges |
|---|---|---|---|
| ADHD | 0.9707 | 0.9698 | 56 |
| Anxiety | 0.9907 | 0.9920 | 350 |
| Bipolar | 0.9929 | 0.9919 | 544 |
| MDD | 0.9899 | 0.9898 | 244 |
| Schizophrenia | 0.9623 | 0.9674 | 604 |
Disorders not included in the masked-edge metric table because the current graph did not provide enough stable maskable positives for this benchmark:
- Autism
- Borderline personality disorder
- Eating disorders
- Obsessive-compulsive disorder
- Post-traumatic stress disorder
- Substance use
Baseline comparison
| Baseline | Test AUROC | Test AP |
|---|---|---|
| Disorder prevalence | 0.5000 | 0.5000 |
| Variant degree | 0.5661 | 0.6184 |
| Additive prior | 0.5430 | 0.5402 |
| Low-rank SVD | 0.5994 | 0.6413 |
Inputs and outputs
Inputs
The checkpoint expects:
- SNP feature matrix
- gene feature matrix
- disorder feature matrix
- SNP-gene edge index
- SNP-disorder edge index
- variant and disorder mappings
These are provided by the associated public graph artifact.
Outputs
For a scored (variant, disorder) pair, the model produces:
- a link score indicating whether the SNP-disorder edge should exist under the masked-edge task
- an effect-size prediction on the normalized training scale
The primary output of this release is the masked-edge link score.
How to use
Minimal checkpoint loading:
import torch
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
"lighteternal/psychgnn-masked-edge-imputation-model",
"model.pt",
repo_type="model",
)
checkpoint = torch.load(ckpt_path, map_location="cpu", weights_only=False)
print(checkpoint["hyperparams"])
print(checkpoint["feature_dims"])
print(checkpoint["report"]["masked_split"]["test_auroc"])
To run inference, instantiate a heterogeneous GraphSAGE-style model matching the architecture above, load checkpoint["state_dict"], and score variant-disorder pairs against lighteternal/psychgnn-psychiatric-graph.
Files in this repository
model.ptevaluation_report.json
Limitations
- This model is evaluated on a masked-edge recovery task, not on patient outcomes.
- The headline benchmark covers only disorders with enough maskable positive edges under the current graph construction.
- The graph uses a strict GWS edge definition; this reduces benchmark coverage for some disorders.
- The checkpoint should not be interpreted as evidence of robust generalization to completely unseen disorders.
- The effect-size head is trained on harmonized summary-statistics edges and does not constitute a causal estimate.
Appropriate use
Reasonable uses:
- cross-disorder variant prioritization
- exploratory pleiotropy analysis
- follow-up prioritization for psychiatric genetics studies
- downstream research tooling built on the published graph
Inappropriate uses:
- clinical decision-making
- diagnosis or screening
- patient-level risk prediction
- treatment selection