dann-od commited on
Commit
07e5183
·
verified ·
1 Parent(s): 17b6c94

Fixed broken stuff in the model card

Browse files
Files changed (2) hide show
  1. README.md +62 -22
  2. infer_trt.py +2 -3
README.md CHANGED
@@ -26,11 +26,34 @@ extra_gated_description: |
26
  extra_gated_button_content: Request access
27
  ---
28
 
29
- <p align="center">
30
- <a href="https://embedl.com">
31
- <img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/embedl_banner.svg" alt="Embedl" width="240">
32
- </a>
33
- </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  # Embedl Chronos-2 (Quantized for TensorRT)
36
 
@@ -42,14 +65,24 @@ low-latency NVIDIA TensorRT inference on edge GPUs. Two
42
  static-context variants ship: **ctx=512** for short-history
43
  forecasting and **ctx=2048** for long-history use cases.
44
 
 
 
 
 
 
 
 
 
 
 
45
  ## Highlights
46
 
47
  - **Per-tensor INT8** activations + **per-channel INT8** weights via
48
  embedl-deploy's PTQ flow on top of TensorRT's fused MHA kernel.
49
  No QAT or distillation needed.
50
  - **Drop-in replacement** for `amazon/chronos-2` inference: same
51
- `(context, group_ids) → quantile_preds` signature; 21-quantile
52
- output grid (0.01..0.99 step 0.05).
53
  - **Validated** on the [GIFT-Eval](https://huggingface.co/datasets/Salesforce/GiftEval)
54
  benchmark across 125 task configurations. See Accuracy below.
55
  - **Two ctx variants** so you can pick the latency/history-window
@@ -86,7 +119,7 @@ Latency measured with TensorRT + `trtexec`, GPU compute time only
86
  (`--noDataTransfers`), CUDA Graph + Spin Wait enabled, clocks locked
87
  (`nvpmodel -m 0 && jetson_clocks` on Jetson).
88
 
89
- ### Jetson Orin Nano Super
90
 
91
  #### ctx=512
92
 
@@ -147,20 +180,6 @@ forecasts (`solar/10T/{medium,long}`). The full per-task CSVs
147
  ship with the artifacts; check them before deploying to a domain
148
  that resembles those outliers.
149
 
150
- ## How This Was Built
151
-
152
- 1. **Trace.** `torch.export` on `amazon/chronos-2` at static
153
- ctx={512, 2048}, `num_output_patches=64` (1024-step horizon).
154
- 2. **Transform.** embedl-deploy's TensorRT pattern set recomposes
155
- chronos-2's manual SDPA + RoPE into fused INT8-ready modules.
156
- 3. **Quantize (PTQ).** Per-tensor symmetric INT8 activations +
157
- per-channel INT8 weights. Calibrated on a 1327-window cohort
158
- drawn from 28 diverse GIFT-Eval **training** splits (no eval-set
159
- leakage).
160
- 4. **Export.** ONNX opset 20 + onnxslim + onnxsim simplification,
161
- with Bool→Float casts inserted before Einsum nodes for TRT 10.x
162
- compatibility.
163
-
164
  ## Creating Your Own Optimized Models
165
 
166
  This artifact was produced with
@@ -183,3 +202,24 @@ We offer engineering support for on-prem/edge deployments and partner
183
  co-marketing opportunities. Reach out at
184
  [contact@embedl.com](mailto:contact@embedl.com), or open an issue on
185
  [GitHub](https://github.com/embedl/embedl-deploy).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  extra_gated_button_content: Request access
27
  ---
28
 
29
+ <!-- embedl-banner:start -->
30
+ <style>
31
+ .embedl-btn-primary { transition: background 160ms ease, box-shadow 160ms ease; }
32
+ .embedl-btn-primary:hover { background: #4FDCE4 !important; box-shadow: 0 8px 22px rgba(45,212,221,0.45) !important; }
33
+ .embedl-btn-secondary { transition: background 160ms ease; }
34
+ .embedl-btn-secondary:hover { background: rgba(45,212,221,0.15) !important; }
35
+ .embedl-headline { font-size: clamp(11px, 2.15vw, 15px) !important; }
36
+ .embedl-btn-primary, .embedl-btn-secondary {
37
+ font-size: clamp(11px, 1.65vw, 13px) !important;
38
+ padding: clamp(6px, 1.1vw, 9px) clamp(10px, 1.6vw, 14px) !important;
39
+ }
40
+ </style>
41
+ <div style="background:radial-gradient(600px 220px at 0% 50%,rgba(45,212,221,0.22) 0%,rgba(45,212,221,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(45,212,221,0.10) 0%,rgba(45,212,221,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(45,212,221,0.28);border-radius:12px;padding:22px 24px;margin:0 0 24px 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
42
+ <table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
43
+ <tr style="background:transparent;">
44
+ <td style="vertical-align:middle;border:0;padding:0;background:transparent;">
45
+ <div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#2DD4DD;background:rgba(45,212,221,0.15);border:1px solid rgba(45,212,221,0.35);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Optimized by Embedl</div>
46
+ <div class="embedl-headline" style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need to <span style="color:#2DD4DD;white-space:nowrap;">fine-tune</span>, hit <span style="color:#2DD4DD;white-space:nowrap;">performance targets</span>, or deploy on <span style="color:#2DD4DD;white-space:nowrap;">specific hardware</span>?</div>
47
+ <div style="font-size:13px;color:#9BA7B5;">We've got you covered.</div>
48
+ </td>
49
+ <td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
50
+ <a href="https://www.embedl.com/models" class="embedl-btn-secondary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;color:#2DD4DD;text-decoration:none;margin-right:8px;">Learn more</a>
51
+ <a href="https://www.embedl.com/contact" class="embedl-btn-primary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;background:#2DD4DD;color:#0B1626;text-decoration:none;box-shadow:0 6px 18px rgba(45,212,221,0.28);">Get in touch →</a>
52
+ </td>
53
+ </tr>
54
+ </table>
55
+ </div>
56
+ <!-- embedl-banner:end -->
57
 
58
  # Embedl Chronos-2 (Quantized for TensorRT)
59
 
 
65
  static-context variants ship: **ctx=512** for short-history
66
  forecasting and **ctx=2048** for long-history use cases.
67
 
68
+ ## Upstream Model
69
+
70
+ <a href="https://hfviewer.com/amazon/chronos-2?utm_source=huggingface&amp;utm_medium=embedded_model_card&amp;utm_campaign=amazon__chronos-2_card" target="_blank" rel="noopener">
71
+ <img
72
+ src="https://hfviewer.com/api/card.svg?source=amazon%2Fchronos-2&amp;v=20260501clipcard"
73
+ alt="Open amazon/chronos-2 in hfviewer"
74
+ width="100%"
75
+ />
76
+ </a>
77
+
78
  ## Highlights
79
 
80
  - **Per-tensor INT8** activations + **per-channel INT8** weights via
81
  embedl-deploy's PTQ flow on top of TensorRT's fused MHA kernel.
82
  No QAT or distillation needed.
83
  - **Drop-in replacement** for `amazon/chronos-2` inference: same
84
+ `(context, group_ids) → quantile_preds` signature; 21 evenly
85
+ spaced quantile levels with the median at index 10.
86
  - **Validated** on the [GIFT-Eval](https://huggingface.co/datasets/Salesforce/GiftEval)
87
  benchmark across 125 task configurations. See Accuracy below.
88
  - **Two ctx variants** so you can pick the latency/history-window
 
119
  (`--noDataTransfers`), CUDA Graph + Spin Wait enabled, clocks locked
120
  (`nvpmodel -m 0 && jetson_clocks` on Jetson).
121
 
122
+ ### Jetson AGX Orin (MAXN)
123
 
124
  #### ctx=512
125
 
 
180
  ship with the artifacts; check them before deploying to a domain
181
  that resembles those outliers.
182
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
183
  ## Creating Your Own Optimized Models
184
 
185
  This artifact was produced with
 
202
  co-marketing opportunities. Reach out at
203
  [contact@embedl.com](mailto:contact@embedl.com), or open an issue on
204
  [GitHub](https://github.com/embedl/embedl-deploy).
205
+
206
+ <!-- embedl-discord-banner:start -->
207
+ <style>
208
+ .embedl-discord-btn { transition: background 160ms ease, box-shadow 160ms ease; }
209
+ .embedl-discord-btn:hover { background: #6C77F5 !important; box-shadow: 0 8px 22px rgba(88,101,242,0.55) !important; }
210
+ </style>
211
+ <div style="background:radial-gradient(600px 220px at 0% 50%,rgba(88,101,242,0.22) 0%,rgba(88,101,242,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(88,101,242,0.10) 0%,rgba(88,101,242,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(88,101,242,0.35);border-radius:12px;padding:22px 24px;margin:24px 0 0 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
212
+ <table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
213
+ <tr style="background:transparent;">
214
+ <td style="vertical-align:middle;border:0;padding:0;background:transparent;">
215
+ <div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#A5B4FC;background:rgba(88,101,242,0.18);border:1px solid rgba(88,101,242,0.45);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Community &amp; support</div>
216
+ <div style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need help with this model? Chat with the Embedl team and other engineers on <span style="color:#A5B4FC;white-space:nowrap;">Discord</span>.</div>
217
+ <div style="font-size:13px;color:#9BA7B5;">Quantization gotchas, hardware questions, fine-tuning tips — bring them all.</div>
218
+ </td>
219
+ <td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
220
+ <a href="https://discord.gg/MTbMWdKqE" class="embedl-discord-btn" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #5865F2;background:#5865F2;color:#FFFFFF;text-decoration:none;box-shadow:0 6px 18px rgba(88,101,242,0.35);">Join our Discord →</a>
221
+ </td>
222
+ </tr>
223
+ </table>
224
+ </div>
225
+ <!-- embedl-discord-banner:end -->
infer_trt.py CHANGED
@@ -22,9 +22,8 @@ from pathlib import Path
22
  import numpy as np
23
  import onnxruntime as ort
24
 
25
- # Native output grid emitted by the model: 21 quantiles spaced 0.01..0.99
26
- # step 0.05; the index of the median (q=0.5) is element 10.
27
- QUANTILE_LEVELS = [round(0.01 + 0.05 * i, 2) for i in range(21)]
28
  MEDIAN_IDX = 10
29
  NUM_OUTPUT_PATCHES = 64 # baked into the ONNX
30
  OUTPUT_PATCH_SIZE = 16 # baked into the ONNX
 
22
  import numpy as np
23
  import onnxruntime as ort
24
 
25
+ # chronos-2 emits 21 evenly spaced quantile levels along axis 1 of
26
+ # the output. The median (q=0.5) is element 10.
 
27
  MEDIAN_IDX = 10
28
  NUM_OUTPUT_PATCHES = 64 # baked into the ONNX
29
  OUTPUT_PATCH_SIZE = 16 # baked into the ONNX