PyTorch
normwear2
custom_code
Overview

πŸš€ NormWear2.0: A World Model for Multivariate Physiological Signals.

github-repo arxiv-paper

This is the official implementation of the paper "Toward World Modeling of Physiological Signals with Chaos-Balancing and Latent Dynamical Modeling".

✨ Introduction

Physiological time-series signals reflect complex, multi-scale dynamical processes of the human body. Existing modeling studies primarily focus on static downstream tasks such as classification, event forecasting, or short-horizon next-step prediction, while long-horizon signal-level forecasting and our understanding toward inherently temporal and predictive nature of physiological signals remains underexplored. We introduce NormWear-2, a world model that encodes multimodal physiological signals and intervention variables into a shared latent space and models their temporal evolution through latent dynamics. Our approach combines inference from prior pre-trained knowledge (intuition) with instant unsupervised adaptation for latent transition inference (insight), enabling forecasting across multiple temporal scales, from raw waveform prediction to minute-, hour-, and day-level physiological pattern evolution. During the pretraining phase, we show that effective dynamical modeling is influenced by the diversity of dynamical regimes present in pretraining data. To this end, we propose chaos-theory inspired metrics to quantify time-series dynamics and construct dynamically balanced pretraining corpora. Pretraining on such data leads to more robust latent representations, improving forecasting quality and and maintaining a competitive downstream performance. We experiments across multiple real-world physiological datasets, with diverse temporal resolutions and intervention information at levels of daily-life, point-of-care, and clinical practice. Our results demonstrate that combining dynamics-aware pretraining with latent dynamical modeling consistently improves signal-level forecasting quality in time, frequency, and representation domains, providing a step toward general-purpose world models for physiological signals.

Overview

πŸ“ˆ Usage

Full code base available at:

# Clone the repository
git clone git@github.com:Mobile-Sensing-and-UbiComp-Laboratory/NormWear2.git

Inference Example

import torch
from transformers import AutoModel

import torch
torch.uint64 = torch.int64
torch.uint32 = torch.int32
torch.uint16 = torch.int16

from transformers import AutoModel

model = AutoModel.from_pretrained(
    "mosaic-laboratory/normwear2",
    trust_remote_code=True
)
model.eval()
print("Load Succes!")



# ----- Example forecast I/O -------------------------
# synthetic data
x = torch.rand(2, 256, 3) # bs, L, nvar

# config
context_length = 128
pred_length = 64

# model forward
with torch.no_grad():
    base_out = model.predict(x[:, :context_length, :], pred_length)
print(f"{base_out.shape=}") # base_out.shape=torch.Size([2, 64, 3])

We provide a Demo-code containing several example of making forecasting inference with and without the proposed latent bayesian mechanism.

The training and downstream evaluation generally follows the same pipeline as NormWear. The python libraries dependencies are specified in NormWear/dependencies.txt, with an example bash script NormWear/config_env.sh.

πŸ”₯ Pre-training

To run the pretraining, run the following command:

torchrun --nproc_per_node=4 \
    -m model_and_pretrain.pretrain_main --num_workers 2 \
    --batch_size 4 --accum_iter 4 \
    --epochs 100 --save_every_epoch 10 \
    --blr 2.5e-4 --min_lr 1e-5 --weight_decay 1e-2 \
    --clip_grad 1.0 --ddp 1 \
    --output_dir train_results/ckpts \
    --log_dir train_results/logs \
    --remark normwear2 \
    --mlp_ratio 4.0 --embed_dim 768 \
    --num_heads 12 --depth 12 --fuse_freq 2 \
    --decoder_embed_dim 512 --decoder_num_head 8 \
    --decoder_depth 2 \
    --window_size 4096 --nvar 10 \
    --use_casual 1 --use_cls 0 --token_level_fuse 1 \
    --jepa 0 
Parameter Type Example Value Description
--nproc_per_node <int> 4 Number of GPU processes to launch on a single node for distributed training. Here, 4 GPUs/processes are used.
--num_workers <int> 2 Number of subprocesses used for data loading in each training process. Higher values can improve data pipeline throughput.
--batch_size <int> 4 Number of input samples processed in one forward/backward pass per GPU before gradient accumulation.
--accum_iter <int> 4 Number of gradient accumulation steps. Effective batch size = batch_size Γ— accum_iter Γ— number_of_GPUs.
--epochs <int> 100 Total number of full passes through the training dataset.
--save_every_epoch <int> 10 Save model checkpoints every N epochs. Here, checkpoints are saved every 10 epochs.
--blr <float> 2.5e-4 Base learning rate. Actual learning rate is scaled as: absolute_lr = base_lr Γ— total_batch_size / 256.
--min_lr <float> 1e-5 Minimum learning rate allowed during learning rate decay scheduling.
--weight_decay <float> 1e-2 L2 regularization coefficient applied to model parameters to reduce overfitting.
--clip_grad <float> 1.0 Maximum gradient norm for gradient clipping to stabilize training and prevent exploding gradients.
--ddp <int> 1 Whether to enable Distributed Data Parallel (DDP) training. 1 means enabled.
--output_dir <string> train_results/ckpts Directory used to save model checkpoints and training outputs.
--log_dir <string> train_results/logs Directory used to save TensorBoard logs and other training logs.
--remark <string> normwear2 Custom experiment name or tag used to identify the current training run.
--mlp_ratio <float> 4.0 Expansion ratio of the MLP hidden dimension relative to embedding dimension inside Transformer blocks.
--embed_dim <int> 768 Embedding dimension of the encoder Transformer model.
--num_heads <int> 12 Number of attention heads in the encoder Transformer.
--depth <int> 12 Number of Transformer blocks (layers) in the encoder.
--fuse_freq <int> 2 Frequency of feature fusion across modalities/tokens. Fusion is performed every 2 layers.
--decoder_embed_dim <int> 512 Embedding dimension of the decoder Transformer.
--decoder_num_head <int> 8 Number of attention heads in the decoder Transformer.
--decoder_depth <int> 2 Number of Transformer blocks (layers) in the decoder.
--window_size <int> 4096 Input sequence length or temporal window size used for each training sample.
--nvar <int> 10 Number of input variables/channels in the multivariate time-series input.
--use_casual <int> 1 Whether to use causal attention masking. 1 means enabled (likely autoregressive masking).
--use_cls <int> 0 Whether to use a CLS token for sequence representation. 0 means disabled.
--token_level_fuse <int> 1 Whether to perform token-level feature fusion instead of only higher-level fusion. 1 means enabled.
--jepa <int> 0 Whether to enable JEPA (Joint Embedding Predictive Architecture) training mode. 0 means disabled.
--warmup_epochs <int> Default 10 Number of initial epochs used to gradually increase learning rate from a small value to blr. (Not explicitly set in the command, so default is used.)

πŸ“ Citation

If you find NormWear-2 model useful for your research, please consider citing the associated TBD.

@misc{luo2026worldmodelingphysiologicalsignals,
      title={Toward World Modeling of Physiological Signals with Chaos-Theoretic Balancing and Latent Dynamics}, 
      author={Yunfei Luo and Xi Chen and Yuliang Chen and Lanshuang Zhang and Md Mofijul Islam and Siwei Zhao and Peter Kotanko and Subhasis Dasgupta and Andrew Campbell and Rakesh Malhotra and Tauhidur Rahman},
      year={2026},
      eprint={2605.15465},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.15465}, 
}

πŸ“ƒ License

This project is licensed under the Apache License 2.0.

Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including mosaic-laboratory/normwear2

Paper for mosaic-laboratory/normwear2