PsychGNN All-Tested-Pairs Model

Model summary

This repository contains a heterogeneous graph neural network trained on all tested SNP-disorder pairs on the published psychiatric graph panel.

The model predicts whether a tested SNP-disorder pair is:

null, or
non-null (suggestive or genome-wide significant)

This is a variant-level research model. It is not a clinical model.

Intended task

The training target is all-tested-pairs cross-disorder association prediction:

take the fixed SNP panel used in the graph artifact
collect every tested SNP-disorder pair on that panel from the harmonized dataset
label pairs as positive if sig_tier ∈ {suggestive, gws}
label pairs as negative if sig_tier == null
train the graph model to distinguish non-null from null tested pairs across all 11 disorders

Held-out GWS edges are removed from message passing so the graph encoder does not see the target edge directly.

Data provenance

The checkpoint was trained on:

harmonized dataset: lighteternal/pgc-psychiatric-gwas-harmonized
graph artifact: lighteternal/psychgnn-psychiatric-graph

The harmonized dataset was derived from public OpenMed / PGC Hugging Face repositories, including:

Scope

The modeled disorder panel contains 11 disorder groups:

ADHD
Anxiety
Autism
Bipolar disorder
Borderline personality disorder
Eating disorders
Major depressive disorder
Obsessive-compulsive disorder
Post-traumatic stress disorder
Schizophrenia
Substance use

This evaluation covers all 11 disorders.

Architecture

The encoder is a heterogeneous GraphSAGE-style architecture over:

SNP nodes
gene nodes
disorder nodes

This release uses:

hidden dimension: 128
layers: 2
dropout: 0.15

Decoder heads:

bilinear SNP-disorder link decoder for binary classification
auxiliary effect-size regression head for non-null edges

No disorder-disorder edges are used in the graph for this release.

Training configuration

Best hyperparameters:

hidden dimension: 128
layers: 2
dropout: 0.15
learning rate: 5e-4
weight decay: 1e-5

Checkpoint metadata:

SNP feature dimension: 7
gene feature dimension: 4
disorder feature dimension: 5

The positive class is defined as suggestive ∪ gws, so this model is trained as a binary tested-pair classifier rather than an ordinal multi-tier model.

Graph context

Graph metadata for this release:

variants: 18,979
genes: 1,205
disorders: 11
SNP-disorder edges: 22,687
SNP-gene edges: 65,634
disorder-disorder edges: 0
GWS threshold for graph construction: 5e-8
SNP-gene positional window: 100,000 bp

Evaluation

Primary all-tested-pairs benchmark:

test AUROC: 0.9817
test AP: 0.9276
macro AUROC: 0.9814
macro AP: 0.8647
effect-size Pearson r on non-null test edges: 0.9883
best validation AP: 0.9223

Per-disorder results:

Disorder	AUROC	AP	Test pairs	Positive test pairs
ADHD	0.9860	0.9109	2188	239
Anxiety	0.9665	0.9250	2146	632
Autism	0.9978	0.9043	2473	53
Bipolar	0.9605	0.9309	2240	675
BPD	0.9830	0.6608	1921	49
Eating disorders	0.9985	0.8026	2483	4
MDD	0.9754	0.8904	2569	336
OCD	0.9427	0.6659	2148	80
PTSD	0.9998	0.8333	2491	2
Schizophrenia	0.9866	0.9957	2732	2086
Substance use	0.9982	0.9916	679	100

Some disorders still have very small positive test counts, so their AP values should be interpreted cautiously.

Baseline comparison

Baseline	Test AUROC	Test AP
Disorder prevalence	0.8852	0.6022
Variant prevalence	0.3740	0.1616
Additive prior	0.8170	0.5460
Low-rank SVD	0.5099	0.2408

Class balance on the graph SNP panel

ADHD: 1,428 gws, 168 suggestive, 12,994 null
Anxiety: 2,729 gws, 1,486 suggestive, 10,097 null
Autism: 93 gws, 262 suggestive, 16,138 null
Bipolar: 3,091 gws, 1,412 suggestive, 10,437 null
BPD: 135 gws, 193 suggestive, 12,484 null
Eating disorders: 7 gws, 25 suggestive, 16,533 null
MDD: 1,019 gws, 1,225 suggestive, 14,888 null
OCD: 35 gws, 501 suggestive, 13,793 null
PTSD: 15 gws, 16,599 null
Schizophrenia: 13,474 gws, 436 suggestive, 4,307 null
Substance use: 661 gws, 11 suggestive, 3,862 null

Inputs and outputs

Inputs

The checkpoint expects:

SNP feature matrix
gene feature matrix
disorder feature matrix
SNP-gene edge index
SNP-disorder edge index
variant and disorder mappings

These are provided by the associated public graph artifact.

Outputs

For a scored (variant, disorder) pair, the model produces:

a binary link score for non-null vs null
an auxiliary effect-size estimate for non-null edges

The binary link score is the primary output of this release.

How to use

Minimal checkpoint loading:

import torch
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    "lighteternal/psychgnn-all-tested-pairs-model",
    "model.pt",
    repo_type="model",
)
checkpoint = torch.load(ckpt_path, map_location="cpu", weights_only=False)

print(checkpoint["hyperparams"])
print(checkpoint["feature_dims"])
print(checkpoint["report"]["all_tested_split"]["macro_ap"])

To run inference, instantiate a heterogeneous GraphSAGE-style model matching the architecture above, load the checkpoint state dict, and score variant-disorder pairs against lighteternal/psychgnn-psychiatric-graph.

Files in this repository

model.pt
evaluation_report.json

Limitations

Positive class combines suggestive and genome-wide significant associations into one label.
Several disorders have very small positive test counts even though all 11 are represented.
The graph is built from GWS SNP-disorder edges, so encoder context is narrower than the full harmonized dataset.
This release does not address zero-shot generalization to unseen disorders.
The auxiliary effect head is not a causal estimate.

Appropriate use

Reasonable uses:

broad cross-disorder variant scoring across the 11 modeled disorders
ranking tested variant-disorder pairs for follow-up
exploratory psychiatric genetics analysis
comparison against simpler non-graph baselines

Inappropriate uses:

patient-level prediction
clinical interpretation
screening or diagnosis
treatment selection

Downloads last month: -; Downloads are not tracked for this model. How to track

lighteternal
/

psychgnn-all-tested-pairs-model