CGM-JEPA Pretrained Encoders
Frozen self-supervised encoder weights from the paper CGM-JEPA: Learning Consistent Continuous Glucose Monitor Representations via Predictive Self-Supervised Pretraining. The repo contains the exact checkpoints used to produce Tables 1β8 of the paper for both the paper's main contributions (CGM-JEPA, X-CGM-JEPA) and the two re-pretrained baselines (GluFormer, TS2Vec).
Companion repos: pretraining dataset
CRUISEResearchGroup/CGM-JEPA-Pretraining, labeled splitsCRUISEResearchGroup/CGM-JEPA-Downstream, code github.com/cruiseresearchgroup/CGM-JEPA.
MOMENT and Mantis are not redistributed here. Those baselines are loaded directly from their upstream HF repos (
AutonLab/MOMENT-1-{small,large},paris-noah/Mantis-8M) by the eval pipeline.
Quick start
huggingface-cli download CRUISEResearchGroup/CGM-JEPA --local-dir Output
Then from the code repository:
# Reproduce paper Tables 1β6
python scripts/run_all_eval.py
The downstream eval will load all four checkpoints automatically from the subdirectories below.
Layout
.
βββ cgm_jepa/
β βββ model.safetensors
β βββ config.json
βββ x_cgm_jepa/
β βββ model.safetensors
β βββ config.json
βββ baselines/
βββ gluformer.pt
βββ ts2vec.pkl
cgm_jepa/ and x_cgm_jepa/ use the standard PyTorchModelHubMixin layout β model.safetensors for weights, config.json for architecture hyperparameters β so they load via the standard from_pretrained one-liner (see Loading examples).
baselines/gluformer.pt is {"encoder": state_dict} and baselines/ts2vec.pkl is a full pickled TS2Vec model object (per the upstream library's convention). Their architectures are documented in the Architectures section.
Important note on the baselines
gluformer.pt and ts2vec.pkl are not vendored from upstream releases of those methods. They were re-pretrained on the same open CGM corpus and compute budget as CGM-JEPA / X-CGM-JEPA (Stanford + Colas, 101 epochs, batch 128, lr 1e-4, seed 43) so that the comparison in the paper isolates the pretraining objective rather than mixing in corpus or compute differences. Use these checkpoints when reproducing paper numbers; for other settings, prefer the original authors' releases.
Architectures
cgm_jepa/cgm_jepa.pt and x_cgm_jepa/x_cgm_jepa.pt
Both use the same models.encoder.Encoder class with identical hyperparameters; only the pretraining objective differs. At downstream / inference time only the temporal encoder is used, so the two checkpoints are drop-in interchangeable.
| Field | Value |
|---|---|
patch_size |
12 |
encoder_kernel_size |
3 |
encoder_embed_dim |
96 |
encoder_embed_bias |
True |
encoder_nhead |
6 |
encoder_num_layers |
3 |
encoder_dropout |
0.0 |
Input: a tensor of shape (B, num_patches, patch_size) (raw glucose values, z-scored).
Output: per-patch embedding of shape (B, num_patches, embed_dim). Pool with .mean(dim=1) for a single embedding per sample.
X-CGM-JEPA adds a second pretraining branch that predicts Glucodensity image patches; only the temporal encoder is loaded at inference.
baselines/gluformer.pt
models.gluformer.GluFormer:
| Field | Value |
|---|---|
vocab_size |
278 |
embed_dim |
96 |
nhead |
6 |
num_layers |
3 |
dim_feedforward |
192 |
max_seq_length |
25000 |
dropout |
0.0 |
pad_token |
278 (= vocab_size) |
Input: a tensor of integer bin indices in [0, vocab_size) (raw glucose discretized into the 40β320 mg/dL range with width (320 β 40) / vocab_size). The downstream pipeline detaches GluFormer's output head and uses only the encoder embedding.
baselines/ts2vec.pkl
models.ts2vec.TS2Vec (loaded via eval/baseline_utils/ts2vec_utils.py:load_pretrained_ts2vec):
| Field | Value |
|---|---|
input_dims |
1 |
output_dims |
96 |
hidden_dims |
64 |
depth |
10 |
Saved as a Python pickle of the full model object, matching the upstream ts2vec library convention.
Loading examples
CGM-JEPA / X-CGM-JEPA β from_pretrained one-liner
Encoder is a PyTorchModelHubMixin subclass, so the architecture hyperparameters and weights load in a single call directly from this repo:
from models.encoder import Encoder
encoder = Encoder.from_pretrained("CRUISEResearchGroup/CGM-JEPA", subfolder="cgm_jepa")
encoder.eval()
# X-CGM-JEPA: same call, different subfolder
encoder_x = Encoder.from_pretrained("CRUISEResearchGroup/CGM-JEPA", subfolder="x_cgm_jepa")
config.json for each subfolder is auto-introspected from Encoder.__init__, so no architecture wiring is needed on the user side.
From the CGM-JEPA code repository
config/model_configs.py looks for these checkpoints under Output/cgm_jepa/, Output/x_cgm_jepa/, and Output/baselines/. The huggingface-cli download CRUISEResearchGroup/CGM-JEPA --local-dir Output flow above produces exactly that structure, so the eval pipeline picks them up automatically.
Standalone PyTorch β GluFormer
import torch
import torch.nn as nn
from models.gluformer.gluformer import GluFormer
vocab_size = 278
gluformer = GluFormer(
vocab_size=vocab_size,
embed_dim=96,
nhead=6,
num_layers=3,
dim_feedforward=192,
max_seq_length=25000,
dropout=0.0,
pad_token=vocab_size,
)
gluformer.load_state_dict(
torch.load("Output/baselines/gluformer.pt", map_location="cpu")["encoder"]
)
gluformer.output_head = nn.Identity() # discard the LM head for embedding extraction
gluformer.eval()
Standalone PyTorch β TS2Vec
from eval.baseline_utils.ts2vec_utils import load_pretrained_ts2vec
ts2vec = load_pretrained_ts2vec(
checkpoint_path="Output/baselines/ts2vec.pkl",
device="cpu",
input_dims=1,
output_dims=96,
hidden_dims=64,
depth=10,
)
Pretraining
All four encoders were pretrained on the CGM-JEPA pretraining corpus under identical conditions:
| Setting | Value |
|---|---|
| Corpus | 228 subjects (22 Stanford + 206 Colas), 389,365 readings at 5-min sampling |
| Window length | 288 timesteps (24 hours) |
| Masking ratio | 0.25 |
| Epochs | 101 |
| Batch size | 128 |
| Learning rate | 1e-4 |
| Random seed | 43 |
See config/config_pretrain.py for the full configuration.
Intended use
- Frozen feature extraction from raw CGM windows (24-hour, 5-min sampled, 288 timesteps).
- Linear-probe or shallow-classifier downstream evaluation, especially the IR / Ξ²-cell dysfunction tasks in the paper.
- Comparison baseline for new CGM representation methods, with identical pretraining conditions across all four encoders shipped here.
License & attribution
Released under the MIT license. When using these weights, please cite:
- Our paper (citation TBD; see code repo).
- The two upstream pretraining datasets β Metwally et al. 2025 (Nature Biomedical Engineering) and Colas et al. 2019 (PLOS ONE).
- The original baseline papers when using
gluformer.ptorts2vec.pkl.
Citation
Citation block to be filled once the CGM-JEPA paper has a stable venue / arXiv link.