Time Series Forecasting
ONNX
TensorRT
time-series
chronos
chronos-2
int8
quantization
edge
jetson
orin
Instructions to use embedl/chronos-2-quantized-trt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use embedl/chronos-2-quantized-trt with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Fixed broken stuff in the model card
Browse files- README.md +62 -22
- infer_trt.py +2 -3
README.md
CHANGED
|
@@ -26,11 +26,34 @@ extra_gated_description: |
|
|
| 26 |
extra_gated_button_content: Request access
|
| 27 |
---
|
| 28 |
|
| 29 |
-
<
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
# Embedl Chronos-2 (Quantized for TensorRT)
|
| 36 |
|
|
@@ -42,14 +65,24 @@ low-latency NVIDIA TensorRT inference on edge GPUs. Two
|
|
| 42 |
static-context variants ship: **ctx=512** for short-history
|
| 43 |
forecasting and **ctx=2048** for long-history use cases.
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
## Highlights
|
| 46 |
|
| 47 |
- **Per-tensor INT8** activations + **per-channel INT8** weights via
|
| 48 |
embedl-deploy's PTQ flow on top of TensorRT's fused MHA kernel.
|
| 49 |
No QAT or distillation needed.
|
| 50 |
- **Drop-in replacement** for `amazon/chronos-2` inference: same
|
| 51 |
-
`(context, group_ids) → quantile_preds` signature; 21
|
| 52 |
-
|
| 53 |
- **Validated** on the [GIFT-Eval](https://huggingface.co/datasets/Salesforce/GiftEval)
|
| 54 |
benchmark across 125 task configurations. See Accuracy below.
|
| 55 |
- **Two ctx variants** so you can pick the latency/history-window
|
|
@@ -86,7 +119,7 @@ Latency measured with TensorRT + `trtexec`, GPU compute time only
|
|
| 86 |
(`--noDataTransfers`), CUDA Graph + Spin Wait enabled, clocks locked
|
| 87 |
(`nvpmodel -m 0 && jetson_clocks` on Jetson).
|
| 88 |
|
| 89 |
-
### Jetson Orin
|
| 90 |
|
| 91 |
#### ctx=512
|
| 92 |
|
|
@@ -147,20 +180,6 @@ forecasts (`solar/10T/{medium,long}`). The full per-task CSVs
|
|
| 147 |
ship with the artifacts; check them before deploying to a domain
|
| 148 |
that resembles those outliers.
|
| 149 |
|
| 150 |
-
## How This Was Built
|
| 151 |
-
|
| 152 |
-
1. **Trace.** `torch.export` on `amazon/chronos-2` at static
|
| 153 |
-
ctx={512, 2048}, `num_output_patches=64` (1024-step horizon).
|
| 154 |
-
2. **Transform.** embedl-deploy's TensorRT pattern set recomposes
|
| 155 |
-
chronos-2's manual SDPA + RoPE into fused INT8-ready modules.
|
| 156 |
-
3. **Quantize (PTQ).** Per-tensor symmetric INT8 activations +
|
| 157 |
-
per-channel INT8 weights. Calibrated on a 1327-window cohort
|
| 158 |
-
drawn from 28 diverse GIFT-Eval **training** splits (no eval-set
|
| 159 |
-
leakage).
|
| 160 |
-
4. **Export.** ONNX opset 20 + onnxslim + onnxsim simplification,
|
| 161 |
-
with Bool→Float casts inserted before Einsum nodes for TRT 10.x
|
| 162 |
-
compatibility.
|
| 163 |
-
|
| 164 |
## Creating Your Own Optimized Models
|
| 165 |
|
| 166 |
This artifact was produced with
|
|
@@ -183,3 +202,24 @@ We offer engineering support for on-prem/edge deployments and partner
|
|
| 183 |
co-marketing opportunities. Reach out at
|
| 184 |
[contact@embedl.com](mailto:contact@embedl.com), or open an issue on
|
| 185 |
[GitHub](https://github.com/embedl/embedl-deploy).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
extra_gated_button_content: Request access
|
| 27 |
---
|
| 28 |
|
| 29 |
+
<!-- embedl-banner:start -->
|
| 30 |
+
<style>
|
| 31 |
+
.embedl-btn-primary { transition: background 160ms ease, box-shadow 160ms ease; }
|
| 32 |
+
.embedl-btn-primary:hover { background: #4FDCE4 !important; box-shadow: 0 8px 22px rgba(45,212,221,0.45) !important; }
|
| 33 |
+
.embedl-btn-secondary { transition: background 160ms ease; }
|
| 34 |
+
.embedl-btn-secondary:hover { background: rgba(45,212,221,0.15) !important; }
|
| 35 |
+
.embedl-headline { font-size: clamp(11px, 2.15vw, 15px) !important; }
|
| 36 |
+
.embedl-btn-primary, .embedl-btn-secondary {
|
| 37 |
+
font-size: clamp(11px, 1.65vw, 13px) !important;
|
| 38 |
+
padding: clamp(6px, 1.1vw, 9px) clamp(10px, 1.6vw, 14px) !important;
|
| 39 |
+
}
|
| 40 |
+
</style>
|
| 41 |
+
<div style="background:radial-gradient(600px 220px at 0% 50%,rgba(45,212,221,0.22) 0%,rgba(45,212,221,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(45,212,221,0.10) 0%,rgba(45,212,221,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(45,212,221,0.28);border-radius:12px;padding:22px 24px;margin:0 0 24px 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
|
| 42 |
+
<table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
|
| 43 |
+
<tr style="background:transparent;">
|
| 44 |
+
<td style="vertical-align:middle;border:0;padding:0;background:transparent;">
|
| 45 |
+
<div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#2DD4DD;background:rgba(45,212,221,0.15);border:1px solid rgba(45,212,221,0.35);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Optimized by Embedl</div>
|
| 46 |
+
<div class="embedl-headline" style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need to <span style="color:#2DD4DD;white-space:nowrap;">fine-tune</span>, hit <span style="color:#2DD4DD;white-space:nowrap;">performance targets</span>, or deploy on <span style="color:#2DD4DD;white-space:nowrap;">specific hardware</span>?</div>
|
| 47 |
+
<div style="font-size:13px;color:#9BA7B5;">We've got you covered.</div>
|
| 48 |
+
</td>
|
| 49 |
+
<td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
|
| 50 |
+
<a href="https://www.embedl.com/models" class="embedl-btn-secondary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;color:#2DD4DD;text-decoration:none;margin-right:8px;">Learn more</a>
|
| 51 |
+
<a href="https://www.embedl.com/contact" class="embedl-btn-primary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;background:#2DD4DD;color:#0B1626;text-decoration:none;box-shadow:0 6px 18px rgba(45,212,221,0.28);">Get in touch →</a>
|
| 52 |
+
</td>
|
| 53 |
+
</tr>
|
| 54 |
+
</table>
|
| 55 |
+
</div>
|
| 56 |
+
<!-- embedl-banner:end -->
|
| 57 |
|
| 58 |
# Embedl Chronos-2 (Quantized for TensorRT)
|
| 59 |
|
|
|
|
| 65 |
static-context variants ship: **ctx=512** for short-history
|
| 66 |
forecasting and **ctx=2048** for long-history use cases.
|
| 67 |
|
| 68 |
+
## Upstream Model
|
| 69 |
+
|
| 70 |
+
<a href="https://hfviewer.com/amazon/chronos-2?utm_source=huggingface&utm_medium=embedded_model_card&utm_campaign=amazon__chronos-2_card" target="_blank" rel="noopener">
|
| 71 |
+
<img
|
| 72 |
+
src="https://hfviewer.com/api/card.svg?source=amazon%2Fchronos-2&v=20260501clipcard"
|
| 73 |
+
alt="Open amazon/chronos-2 in hfviewer"
|
| 74 |
+
width="100%"
|
| 75 |
+
/>
|
| 76 |
+
</a>
|
| 77 |
+
|
| 78 |
## Highlights
|
| 79 |
|
| 80 |
- **Per-tensor INT8** activations + **per-channel INT8** weights via
|
| 81 |
embedl-deploy's PTQ flow on top of TensorRT's fused MHA kernel.
|
| 82 |
No QAT or distillation needed.
|
| 83 |
- **Drop-in replacement** for `amazon/chronos-2` inference: same
|
| 84 |
+
`(context, group_ids) → quantile_preds` signature; 21 evenly
|
| 85 |
+
spaced quantile levels with the median at index 10.
|
| 86 |
- **Validated** on the [GIFT-Eval](https://huggingface.co/datasets/Salesforce/GiftEval)
|
| 87 |
benchmark across 125 task configurations. See Accuracy below.
|
| 88 |
- **Two ctx variants** so you can pick the latency/history-window
|
|
|
|
| 119 |
(`--noDataTransfers`), CUDA Graph + Spin Wait enabled, clocks locked
|
| 120 |
(`nvpmodel -m 0 && jetson_clocks` on Jetson).
|
| 121 |
|
| 122 |
+
### Jetson AGX Orin (MAXN)
|
| 123 |
|
| 124 |
#### ctx=512
|
| 125 |
|
|
|
|
| 180 |
ship with the artifacts; check them before deploying to a domain
|
| 181 |
that resembles those outliers.
|
| 182 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 183 |
## Creating Your Own Optimized Models
|
| 184 |
|
| 185 |
This artifact was produced with
|
|
|
|
| 202 |
co-marketing opportunities. Reach out at
|
| 203 |
[contact@embedl.com](mailto:contact@embedl.com), or open an issue on
|
| 204 |
[GitHub](https://github.com/embedl/embedl-deploy).
|
| 205 |
+
|
| 206 |
+
<!-- embedl-discord-banner:start -->
|
| 207 |
+
<style>
|
| 208 |
+
.embedl-discord-btn { transition: background 160ms ease, box-shadow 160ms ease; }
|
| 209 |
+
.embedl-discord-btn:hover { background: #6C77F5 !important; box-shadow: 0 8px 22px rgba(88,101,242,0.55) !important; }
|
| 210 |
+
</style>
|
| 211 |
+
<div style="background:radial-gradient(600px 220px at 0% 50%,rgba(88,101,242,0.22) 0%,rgba(88,101,242,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(88,101,242,0.10) 0%,rgba(88,101,242,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(88,101,242,0.35);border-radius:12px;padding:22px 24px;margin:24px 0 0 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
|
| 212 |
+
<table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
|
| 213 |
+
<tr style="background:transparent;">
|
| 214 |
+
<td style="vertical-align:middle;border:0;padding:0;background:transparent;">
|
| 215 |
+
<div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#A5B4FC;background:rgba(88,101,242,0.18);border:1px solid rgba(88,101,242,0.45);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Community & support</div>
|
| 216 |
+
<div style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need help with this model? Chat with the Embedl team and other engineers on <span style="color:#A5B4FC;white-space:nowrap;">Discord</span>.</div>
|
| 217 |
+
<div style="font-size:13px;color:#9BA7B5;">Quantization gotchas, hardware questions, fine-tuning tips — bring them all.</div>
|
| 218 |
+
</td>
|
| 219 |
+
<td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
|
| 220 |
+
<a href="https://discord.gg/MTbMWdKqE" class="embedl-discord-btn" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #5865F2;background:#5865F2;color:#FFFFFF;text-decoration:none;box-shadow:0 6px 18px rgba(88,101,242,0.35);">Join our Discord →</a>
|
| 221 |
+
</td>
|
| 222 |
+
</tr>
|
| 223 |
+
</table>
|
| 224 |
+
</div>
|
| 225 |
+
<!-- embedl-discord-banner:end -->
|
infer_trt.py
CHANGED
|
@@ -22,9 +22,8 @@ from pathlib import Path
|
|
| 22 |
import numpy as np
|
| 23 |
import onnxruntime as ort
|
| 24 |
|
| 25 |
-
#
|
| 26 |
-
#
|
| 27 |
-
QUANTILE_LEVELS = [round(0.01 + 0.05 * i, 2) for i in range(21)]
|
| 28 |
MEDIAN_IDX = 10
|
| 29 |
NUM_OUTPUT_PATCHES = 64 # baked into the ONNX
|
| 30 |
OUTPUT_PATCH_SIZE = 16 # baked into the ONNX
|
|
|
|
| 22 |
import numpy as np
|
| 23 |
import onnxruntime as ort
|
| 24 |
|
| 25 |
+
# chronos-2 emits 21 evenly spaced quantile levels along axis 1 of
|
| 26 |
+
# the output. The median (q=0.5) is element 10.
|
|
|
|
| 27 |
MEDIAN_IDX = 10
|
| 28 |
NUM_OUTPUT_PATCHES = 64 # baked into the ONNX
|
| 29 |
OUTPUT_PATCH_SIZE = 16 # baked into the ONNX
|