CGM-JEPA / README.md
hadamelino's picture
v2: safetensors + config.json for Encoder.from_pretrained()
076b35e verified
metadata
license: mit
language:
  - en
library_name: pytorch
pipeline_tag: feature-extraction
tags:
  - cgm
  - continuous-glucose-monitor
  - self-supervised-learning
  - jepa
  - time-series
  - masked-prediction
  - biosignal
  - healthcare
  - pretrained-encoder

CGM-JEPA Pretrained Encoders

Frozen self-supervised encoder weights from the paper CGM-JEPA: Learning Consistent Continuous Glucose Monitor Representations via Predictive Self-Supervised Pretraining. The repo contains the exact checkpoints used to produce Tables 1–8 of the paper for both the paper's main contributions (CGM-JEPA, X-CGM-JEPA) and the two re-pretrained baselines (GluFormer, TS2Vec).

Companion repos: pretraining dataset CRUISEResearchGroup/CGM-JEPA-Pretraining, labeled splits CRUISEResearchGroup/CGM-JEPA-Downstream, code github.com/cruiseresearchgroup/CGM-JEPA.

MOMENT and Mantis are not redistributed here. Those baselines are loaded directly from their upstream HF repos (AutonLab/MOMENT-1-{small,large}, paris-noah/Mantis-8M) by the eval pipeline.

Quick start

huggingface-cli download CRUISEResearchGroup/CGM-JEPA --local-dir Output

Then from the code repository:

# Reproduce paper Tables 1–6
python scripts/run_all_eval.py

The downstream eval will load all four checkpoints automatically from the subdirectories below.

Layout

.
β”œβ”€β”€ cgm_jepa/
β”‚   β”œβ”€β”€ model.safetensors
β”‚   └── config.json
β”œβ”€β”€ x_cgm_jepa/
β”‚   β”œβ”€β”€ model.safetensors
β”‚   └── config.json
└── baselines/
    β”œβ”€β”€ gluformer.pt
    └── ts2vec.pkl

cgm_jepa/ and x_cgm_jepa/ use the standard PyTorchModelHubMixin layout β€” model.safetensors for weights, config.json for architecture hyperparameters β€” so they load via the standard from_pretrained one-liner (see Loading examples).

baselines/gluformer.pt is {"encoder": state_dict} and baselines/ts2vec.pkl is a full pickled TS2Vec model object (per the upstream library's convention). Their architectures are documented in the Architectures section.

Important note on the baselines

gluformer.pt and ts2vec.pkl are not vendored from upstream releases of those methods. They were re-pretrained on the same open CGM corpus and compute budget as CGM-JEPA / X-CGM-JEPA (Stanford + Colas, 101 epochs, batch 128, lr 1e-4, seed 43) so that the comparison in the paper isolates the pretraining objective rather than mixing in corpus or compute differences. Use these checkpoints when reproducing paper numbers; for other settings, prefer the original authors' releases.

Architectures

cgm_jepa/cgm_jepa.pt and x_cgm_jepa/x_cgm_jepa.pt

Both use the same models.encoder.Encoder class with identical hyperparameters; only the pretraining objective differs. At downstream / inference time only the temporal encoder is used, so the two checkpoints are drop-in interchangeable.

Field Value
patch_size 12
encoder_kernel_size 3
encoder_embed_dim 96
encoder_embed_bias True
encoder_nhead 6
encoder_num_layers 3
encoder_dropout 0.0

Input: a tensor of shape (B, num_patches, patch_size) (raw glucose values, z-scored). Output: per-patch embedding of shape (B, num_patches, embed_dim). Pool with .mean(dim=1) for a single embedding per sample.

X-CGM-JEPA adds a second pretraining branch that predicts Glucodensity image patches; only the temporal encoder is loaded at inference.

baselines/gluformer.pt

models.gluformer.GluFormer:

Field Value
vocab_size 278
embed_dim 96
nhead 6
num_layers 3
dim_feedforward 192
max_seq_length 25000
dropout 0.0
pad_token 278 (= vocab_size)

Input: a tensor of integer bin indices in [0, vocab_size) (raw glucose discretized into the 40–320 mg/dL range with width (320 βˆ’ 40) / vocab_size). The downstream pipeline detaches GluFormer's output head and uses only the encoder embedding.

baselines/ts2vec.pkl

models.ts2vec.TS2Vec (loaded via eval/baseline_utils/ts2vec_utils.py:load_pretrained_ts2vec):

Field Value
input_dims 1
output_dims 96
hidden_dims 64
depth 10

Saved as a Python pickle of the full model object, matching the upstream ts2vec library convention.

Loading examples

CGM-JEPA / X-CGM-JEPA β€” from_pretrained one-liner

Encoder is a PyTorchModelHubMixin subclass, so the architecture hyperparameters and weights load in a single call directly from this repo:

from models.encoder import Encoder

encoder = Encoder.from_pretrained("CRUISEResearchGroup/CGM-JEPA", subfolder="cgm_jepa")
encoder.eval()

# X-CGM-JEPA: same call, different subfolder
encoder_x = Encoder.from_pretrained("CRUISEResearchGroup/CGM-JEPA", subfolder="x_cgm_jepa")

config.json for each subfolder is auto-introspected from Encoder.__init__, so no architecture wiring is needed on the user side.

From the CGM-JEPA code repository

config/model_configs.py looks for these checkpoints under Output/cgm_jepa/, Output/x_cgm_jepa/, and Output/baselines/. The huggingface-cli download CRUISEResearchGroup/CGM-JEPA --local-dir Output flow above produces exactly that structure, so the eval pipeline picks them up automatically.

Standalone PyTorch β€” GluFormer

import torch
import torch.nn as nn
from models.gluformer.gluformer import GluFormer

vocab_size = 278
gluformer = GluFormer(
    vocab_size=vocab_size,
    embed_dim=96,
    nhead=6,
    num_layers=3,
    dim_feedforward=192,
    max_seq_length=25000,
    dropout=0.0,
    pad_token=vocab_size,
)
gluformer.load_state_dict(
    torch.load("Output/baselines/gluformer.pt", map_location="cpu")["encoder"]
)
gluformer.output_head = nn.Identity()   # discard the LM head for embedding extraction
gluformer.eval()

Standalone PyTorch β€” TS2Vec

from eval.baseline_utils.ts2vec_utils import load_pretrained_ts2vec

ts2vec = load_pretrained_ts2vec(
    checkpoint_path="Output/baselines/ts2vec.pkl",
    device="cpu",
    input_dims=1,
    output_dims=96,
    hidden_dims=64,
    depth=10,
)

Pretraining

All four encoders were pretrained on the CGM-JEPA pretraining corpus under identical conditions:

Setting Value
Corpus 228 subjects (22 Stanford + 206 Colas), 389,365 readings at 5-min sampling
Window length 288 timesteps (24 hours)
Masking ratio 0.25
Epochs 101
Batch size 128
Learning rate 1e-4
Random seed 43

See config/config_pretrain.py for the full configuration.

Intended use

  • Frozen feature extraction from raw CGM windows (24-hour, 5-min sampled, 288 timesteps).
  • Linear-probe or shallow-classifier downstream evaluation, especially the IR / Ξ²-cell dysfunction tasks in the paper.
  • Comparison baseline for new CGM representation methods, with identical pretraining conditions across all four encoders shipped here.

License & attribution

Released under the MIT license. When using these weights, please cite:

  1. Our paper (citation TBD; see code repo).
  2. The two upstream pretraining datasets β€” Metwally et al. 2025 (Nature Biomedical Engineering) and Colas et al. 2019 (PLOS ONE).
  3. The original baseline papers when using gluformer.pt or ts2vec.pkl.

Citation

Citation block to be filled once the CGM-JEPA paper has a stable venue / arXiv link.

Code repository

github.com/cruiseresearchgroup/CGM-JEPA