JL-CSDI ESc1 Foundation Model

Probabilistic order-flow forecaster for E-mini S&P 500 futures (ESc1). Given a 100-window context (roughly 50,000 recent trades), the model produces Monte Carlo samples of the joint distribution of 8 trade-time order-flow features over the next 10 windows.

Architecture

CSDI (Tashiro et al., NeurIPS 2021) with the Jump-Laplace noise model from Baule (2025), v2 architecture variant.

Parameter	Value
Total parameters	61.93 M
Residual blocks	6 with dual attention (temporal then cross-feature)
Channels per block	528
Attention heads	6
Feed-forward dim	2112
Diffusion steps (training)	800
JL reverse-sampling steps	50, uniform Δt = 0.12, T_max = 6.0
Noise scale σ	1.0 (global)
Noise schedule	tanh6-1 (Li et al., Nature Machine Intelligence 2024)

The release weights are the EMA snapshot at the best validation epoch (87). Training auxiliaries (distributional aux head, importance reweighting) are disabled in the release config because they are not used at inference.

Training data

ESc1 trade-by-trade data from LSEG TickHistory, aggressor-labeled (explicit field where present, qualifier-regex recovery otherwise).

Split	Range	Windows	Trades
Train	2016-07-01 to 2025-01-01	1.33 M	665 M
Validation	2025-01-01 to 2025-04-01	6,855	filtered subset
Held-out test	2025-04-01 to 2026-04-01	23,175	filtered subset

The training range covers Volmageddon (February 2018), the COVID crash and recovery (March 2020 onward), the 2023 banking crisis (Silicon Valley Bank, Credit Suisse), and the 2024 volatility-compression regime. Pre-July 2016 data is excluded because LSEG TickHistory does not expose trade aggressor for CME futures in that period.

Backtesting on dates inside the training window introduces leakage. Use 2026-04-01 onward or pre-2016-07 (with a different data source) for clean out-of-sample evaluation.

Input and output

Input  : dict with
            "observed_data"  (B, L=110, K=8)
            "observed_mask"  (B, L=110, K=8)
            "gt_mask"        (B, L=110, K=8)   1 over context, 0 over forecast
            "timepoints"     (B, L=110)
            "feature_id"     (B, K=8)

Output : samples (B, n_samples, K=8, L=110)
            n_samples Monte Carlo realizations of the joint forecast.

The 8 channels are ESc1 trade-time order-flow features (Maitrier and Bouchaud a-family imbalance at a = 0, 0.25, 0.5, 0.75, 1.0; log return; log realized variance; log total volume). Channel order and normalization scalars are in normalization_params.json.

Performance

Evaluated on 100 windows from the held-out test set (2025-04-01 onward), n_samples = 32, RTX 5090. Two production samplers reported.

Metric	Vanilla JL-SDE	MGD-conditional
Validation loss (EMA, epoch 87)	0.0220	0.0220
90% interval empirical coverage on imbalance channels, low / mid / high volatility	0.93 / 0.93 / 0.94	0.88 / 0.89 / 0.88
Relative error on cov(imb_a0, imb_a025) vs realized	5.8 %	1.9 %
Sample diversity ratio (within-MC std / MC-mean trajectory std)	4.05 to 4.83	4.11 to 4.83
Wall time per 100 windows	493 s	470 s

MGD-conditional is the recommended sampler. It applies a moment-guided correction at sampling time (Lempereur et al., 2026) that enforces calibrated cross-channel coupling without retraining. Coverage and cross-channel results are best at 90 % nominal; sample diversity above 1.0 in every channel confirms no mean collapse.

Quick start

git lfs install
git clone https://huggingface.co/S-teven/jl-csdi-mgd
cd jl-csdi-mgd
pip install -r requirements.txt
python inference.py --sampler sde-mgd --n-samples 32

The inference.py example runs on a synthetic batch and prints the output tensor shape. For real data, normalize raw 8-channel features with normalization_params.json (V5-hybrid-D: T-scaling on imbalance channels, sqrt-T and delta-std on log_ret, MAD plus z-score on log_realized_var and log_tot_vol).

Inference modes

inference.py accepts --sampler {sde, sde-mgd}.

sde: vanilla Jump-Laplace SDE reverse sampling (Baule, 2025).
sde-mgd: vanilla SDE with conditional moment-guided correction (recommended).

Programmatic usage is straightforward once the model is loaded:

from safetensors.torch import load_file
import yaml, torch
from main_model import CSDI_Forecasting

cfg = yaml.safe_load(open("config.yaml"))
model = CSDI_Forecasting(cfg, "cuda", cfg["model"]["target_dim"]).to("cuda")
model.load_state_dict(load_file("model.safetensors"), strict=True)
model.eval()

samples = model.impute_jl_sde_mgd(
    observed_data, cond_mask, side_info, n_samples=32,
    mgd_target_mode="conditional",
)

Files

File	Purpose
`model.safetensors`	EMA weights, Lightning prefix stripped (198 tensors, 61.93 M params, 248 MB)
`config.yaml`	Architecture hyperparameters; inference-time configuration
`normalization_params.json`	V5-hybrid-D normalization scalars
`mgd_target_moments.npz`	Optional precomputed unconditional MGD trajectory
`main_model.py`	`CSDI_Forecasting` class
`diff_models_v2.py`	Diffusion network
`jl_noise.py`	Jump-Laplace forward and sampling primitives
`mgd_step_torch.py`	Moment-guided sampling-time correction (centered polynomial moments)
`inference.py`	Working example
`requirements.txt`	Pinned dependencies

Limitations

The model outputs Monte Carlo samples. Downstream consumers compute the quantities they need (probabilities, quantiles, joint event likelihoods) from those samples. It is trained on ESc1 only; transfer to other instruments has not been evaluated. Backtest on dates outside the training window.

References

Tashiro, Y., Song, J., Song, Y., Ermon, S. (2021). CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation. Advances in Neural Information Processing Systems 34, 24804-24816. arXiv:2107.03502.
Baule, A. (2025). Generative modelling with jump-diffusions. arXiv:2503.06558.
Lempereur, E., Cuvelle-Magar, N., Coeurdoux, F., Mallat, S., Vanden-Eijnden, E. (2026). MGD: Moment Guided Diffusion for Maximum Entropy Generation. arXiv:2602.17211.
Maitrier, G., Bouchaud, J.-P. (2025). The Subtle Interplay between Square-root Impact, Order Imbalance and Volatility: A Unifying Framework. arXiv:2506.07711.
Li, T., Biferale, L., Bonaccorso, F., Scarpolini, M. A., Buzzicotti, M. (2024). Synthetic Lagrangian turbulence by generative diffusion models. Nature Machine Intelligence 6(4), 393-403. arXiv:2307.08529.
Yang, Y., Zha, K., Chen, Y.-C., Wang, H., Katabi, D. (2021). Delving into Deep Imbalanced Regression. Proceedings of the 38th International Conference on Machine Learning. arXiv:2102.09554.

License

Internal and research use. Not for redistribution.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Time Series Forecasting

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for S-teven/jl-csdi-mgd

The Subtle Interplay between Square-root Impact, Order Imbalance & Volatility: A Unifying Framework

Paper • 2506.07711 • Published Mar 4

CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation

Paper • 2107.03502 • Published Oct 27, 2021