Time Series Forecasting
ONNX
TensorRT
time-series
chronos
chronos-2
int8
quantization
edge
jetson
orin
Instructions to use embedl/chronos-2-quantized-trt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use embedl/chronos-2-quantized-trt with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| license: other | |
| license_name: embedl-models-community-licence-v1.0 | |
| license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE | |
| base_model: amazon/chronos-2 | |
| quantized_from: amazon/chronos-2 | |
| tags: | |
| - time-series | |
| - time-series-forecasting | |
| - chronos | |
| - chronos-2 | |
| - int8 | |
| - tensorrt | |
| - quantization | |
| - edge | |
| - jetson | |
| - orin | |
| library_name: onnx | |
| pipeline_tag: time-series-forecasting | |
| gated: true | |
| extra_gated_heading: Acknowledge Embedl Models Community Licence v1.0 | |
| extra_gated_description: | | |
| By requesting access you agree to the Embedl Models Community | |
| Licence v1.0 (no redistribution as a hosted service) and to the | |
| upstream chronos-2 license terms. | |
| extra_gated_button_content: Request access | |
| <!-- embedl-banner:start --> | |
| <style> | |
| .embedl-btn-primary { transition: background 160ms ease, box-shadow 160ms ease; } | |
| .embedl-btn-primary:hover { background: #4FDCE4 !important; box-shadow: 0 8px 22px rgba(45,212,221,0.45) !important; } | |
| .embedl-btn-secondary { transition: background 160ms ease; } | |
| .embedl-btn-secondary:hover { background: rgba(45,212,221,0.15) !important; } | |
| .embedl-headline { font-size: clamp(11px, 2.15vw, 15px) !important; } | |
| .embedl-btn-primary, .embedl-btn-secondary { | |
| font-size: clamp(11px, 1.65vw, 13px) !important; | |
| padding: clamp(6px, 1.1vw, 9px) clamp(10px, 1.6vw, 14px) !important; | |
| } | |
| </style> | |
| <div style="background:radial-gradient(600px 220px at 0% 50%,rgba(45,212,221,0.22) 0%,rgba(45,212,221,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(45,212,221,0.10) 0%,rgba(45,212,221,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(45,212,221,0.28);border-radius:12px;padding:22px 24px;margin:0 0 24px 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;"> | |
| <table style="width:100%;border-collapse:collapse;border:0;background:transparent;"> | |
| <tr style="background:transparent;"> | |
| <td style="vertical-align:middle;border:0;padding:0;background:transparent;"> | |
| <div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#2DD4DD;background:rgba(45,212,221,0.15);border:1px solid rgba(45,212,221,0.35);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Optimized by Embedl</div> | |
| <div class="embedl-headline" style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need to <span style="color:#2DD4DD;white-space:nowrap;">fine-tune</span>, hit <span style="color:#2DD4DD;white-space:nowrap;">performance targets</span>, or deploy on <span style="color:#2DD4DD;white-space:nowrap;">specific hardware</span>?</div> | |
| <div style="font-size:13px;color:#9BA7B5;">We've got you covered.</div> | |
| </td> | |
| <td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;"> | |
| <a href="https://www.embedl.com/models" class="embedl-btn-secondary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;color:#2DD4DD;text-decoration:none;margin-right:8px;">Learn more</a> | |
| <a href="https://www.embedl.com/contact" class="embedl-btn-primary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;background:#2DD4DD;color:#0B1626;text-decoration:none;box-shadow:0 6px 18px rgba(45,212,221,0.28);">Get in touch β</a> | |
| </td> | |
| </tr> | |
| </table> | |
| </div> | |
| <!-- embedl-banner:end --> | |
| # Embedl Chronos-2 (Quantized for TensorRT) | |
| Deployable INT8-quantized version of | |
| [`amazon/chronos-2`](https://huggingface.co/amazon/chronos-2), | |
| optimized with | |
| [embedl-deploy](https://github.com/embedl/embedl-deploy) for | |
| low-latency NVIDIA TensorRT inference on edge GPUs. Two | |
| static-context variants ship: **ctx=512** for short-history | |
| forecasting and **ctx=2048** for long-history use cases. | |
| ## Upstream Model | |
| <a href="https://hfviewer.com/amazon/chronos-2?utm_source=huggingface&utm_medium=embedded_model_card&utm_campaign=amazon__chronos-2_card" target="_blank" rel="noopener"> | |
| <img | |
| src="https://hfviewer.com/api/card.svg?source=amazon%2Fchronos-2&v=20260501clipcard" | |
| alt="Open amazon/chronos-2 in hfviewer" | |
| width="100%" | |
| /> | |
| </a> | |
| ## Highlights | |
| - **Per-tensor INT8** activations + **per-channel INT8** weights via | |
| embedl-deploy's PTQ flow on top of TensorRT's fused MHA kernel. | |
| No QAT or distillation needed. | |
| - **Drop-in replacement** for `amazon/chronos-2` inference: same | |
| `(context, group_ids) β quantile_preds` signature; 21 evenly | |
| spaced quantile levels with the median at index 10. | |
| - **Validated** on the [GIFT-Eval](https://huggingface.co/datasets/Salesforce/GiftEval) | |
| benchmark across 125 task configurations. See Accuracy below. | |
| - **Two ctx variants** so you can pick the latency/history-window | |
| trade-off that fits your deployment. | |
| ## Quick Start | |
| ```bash | |
| pip install tensorrt pycuda numpy | |
| python infer_trt.py --ctx 512 # 1.2Γ faster than FP16 on Orin | |
| python infer_trt.py --ctx 2048 # 1.3Γ faster than FP16 on Orin | |
| ``` | |
| The `infer_trt.py` helper script builds a TensorRT engine from the | |
| ONNX on first run (cached as `*.engine` next to the artifact) and | |
| feeds a synthetic seasonal context for demonstration. Replace the | |
| context generator with your own series of the right length. | |
| ## Files | |
| | File | Purpose | | |
| |---|---| | |
| | `embedl_chronos_2_ctx512_int8.onnx` | INT8 ONNX with Q/DQ β ctx=512, 1024-step horizon. | | |
| | `embedl_chronos_2_ctx2048_int8.onnx` | INT8 ONNX with Q/DQ β ctx=2048, 1024-step horizon. | | |
| | `infer_trt.py` | ONNX Runtime / TensorRT inference example. | | |
| Both artifacts emit a `(1, 21, 1024)` quantile tensor (21 quantile | |
| levels Γ 64 output patches Γ 16 steps-per-patch = 1024 horizon | |
| steps). Slice the median (`preds[0, 10]`) for a point forecast and | |
| clip to your needed prediction length. | |
| ## Performance | |
| Latency measured with TensorRT + `trtexec`, GPU compute time only | |
| (`--noDataTransfers`), CUDA Graph + Spin Wait enabled, clocks locked | |
| (`nvpmodel -m 0 && jetson_clocks` on Jetson). | |
| ### Jetson AGX Orin (MAXN) | |
| #### ctx=512 | |
| <p align="center"> | |
| <img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/chronos-2-quantized-trt/chronos-2-quantized-trt__orin-mountain-view__latency_ctx512.svg" alt="Chronos-2 INT8 latency, ctx=512" width="640"> | |
| </p> | |
| | Build | Mean latency (ms) | | |
| |---|---| | |
| | TensorRT FP16 | **2.977** | | |
| | TensorRT `--best` | 2.974 | | |
| | **embedl INT8** | **2.432** | | |
| | Speedup (FP16 β embedl INT8) | **1.22Γ** | | |
| #### ctx=2048 | |
| <p align="center"> | |
| <img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/chronos-2-quantized-trt/chronos-2-quantized-trt__orin-mountain-view__latency_ctx2048.svg" alt="Chronos-2 INT8 latency, ctx=2048" width="640"> | |
| </p> | |
| | Build | Mean latency (ms) | | |
| |---|---| | |
| | TensorRT FP16 | **4.482** | | |
| | TensorRT `--best` | 4.482 | | |
| | **embedl INT8** | **3.482** | | |
| | Speedup (FP16 β embedl INT8) | **1.29Γ** | | |
| ## Accuracy | |
| Evaluated on the | |
| [GIFT-Eval](https://huggingface.co/datasets/Salesforce/GiftEval) | |
| benchmark β 125 task configurations spanning 50 datasets Γ | |
| {short, medium, long} horizons. Aggregate WQL (weighted quantile | |
| loss, lower is better) reported using the | |
| [TIME-paper normalization](https://arxiv.org/html/2602.12147v2): | |
| geomean of per-task ratio against the Seasonal-Naive baseline. | |
| | Metric | FP32 baseline | **embedl INT8 ctx=512** | **embedl INT8 ctx=2048** | | |
| |---|---|---|---| | |
| | Geomean WQL / Seasonal-Naive | 0.549 | **0.634** | **0.618** | | |
| | Geomean WQL / FP32 | 1.000 | **1.156Γ** | **1.126Γ** | | |
| | Median WQL / FP32 | 1.000 | 1.074Γ | 1.045Γ | | |
| | Cells within 10 % of FP32 | β | 71 / 125 (57 %) | 79 / 125 (63 %) | | |
| | Cells within 20 % of FP32 | β | 96 / 125 (77 %) | 98 / 125 (78 %) | | |
| | Cells beating FP32 | β | 14 / 125 | 19 / 125 | | |
| **How to read the headline number.** Geomean WQL/S-Naive 0.634 | |
| (ctx=512) and 0.618 (ctx=2048) means the INT8 model retains the | |
| bulk of `chronos-2`'s skill margin over the no-model Seasonal-Naive | |
| baseline. The FP32 model sits at 0.549 by the same convention; the | |
| INT8 versions are 15-16 % closer to S-Naive but still convincingly | |
| beat it on the geomean. | |
| **Where the regression concentrates.** Worst-case cells are | |
| out-of-distribution low-frequency series (`us_births/M`, | |
| `m4_hourly/{medium,long}`) and high-frequency long-horizon | |
| forecasts (`solar/10T/{medium,long}`). The full per-task CSVs | |
| ship with the artifacts; check them before deploying to a domain | |
| that resembles those outliers. | |
| ## Creating Your Own Optimized Models | |
| This artifact was produced with | |
| [embedl-deploy](https://github.com/embedl/embedl-deploy), Embedl's | |
| open-source PyTorch β TensorRT deployment library. The same workflow | |
| applies to your own models β see | |
| [the documentation](https://github.com/embedl/embedl-deploy#readme) | |
| for installation and usage. | |
| ## License | |
| | Component | License | | |
| |---|---| | |
| | Optimized model artifacts (this repo) | [Embedl Models Community Licence v1.0](https://github.com/embedl/embedl-models/blob/main/LICENSE) β no redistribution as a hosted service | | |
| | Upstream architecture and weights | [Amazon Chronos-2 License](https://huggingface.co/amazon/chronos-2/blob/main/LICENSE) | | |
| ## Contact | |
| We offer engineering support for on-prem/edge deployments and partner | |
| co-marketing opportunities. Reach out at | |
| [contact@embedl.com](mailto:contact@embedl.com), or open an issue on | |
| [GitHub](https://github.com/embedl/embedl-deploy). | |
| <!-- embedl-discord-banner:start --> | |
| <style> | |
| .embedl-discord-btn { transition: background 160ms ease, box-shadow 160ms ease; } | |
| .embedl-discord-btn:hover { background: #6C77F5 !important; box-shadow: 0 8px 22px rgba(88,101,242,0.55) !important; } | |
| </style> | |
| <div style="background:radial-gradient(600px 220px at 0% 50%,rgba(88,101,242,0.22) 0%,rgba(88,101,242,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(88,101,242,0.10) 0%,rgba(88,101,242,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(88,101,242,0.35);border-radius:12px;padding:22px 24px;margin:24px 0 0 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;"> | |
| <table style="width:100%;border-collapse:collapse;border:0;background:transparent;"> | |
| <tr style="background:transparent;"> | |
| <td style="vertical-align:middle;border:0;padding:0;background:transparent;"> | |
| <div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#A5B4FC;background:rgba(88,101,242,0.18);border:1px solid rgba(88,101,242,0.45);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Community & support</div> | |
| <div style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need help with this model? Chat with the Embedl team and other engineers on <span style="color:#A5B4FC;white-space:nowrap;">Discord</span>.</div> | |
| <div style="font-size:13px;color:#9BA7B5;">Quantization gotchas, hardware questions, fine-tuning tips β bring them all.</div> | |
| </td> | |
| <td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;"> | |
| <a href="https://discord.gg/MTbMWdKqE" class="embedl-discord-btn" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #5865F2;background:#5865F2;color:#FFFFFF;text-decoration:none;box-shadow:0 6px 18px rgba(88,101,242,0.35);">Join our Discord β</a> | |
| </td> | |
| </tr> | |
| </table> | |
| </div> | |
| <!-- embedl-discord-banner:end --> | |