Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- astronomy
|
| 4 |
+
- time-series
|
| 5 |
+
- light-curves
|
| 6 |
+
- onnx
|
| 7 |
+
library_name: onnx
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# ATCAT
|
| 11 |
+
|
| 12 |
+
## Paper
|
| 13 |
+
|
| 14 |
+
Tung, Z. (2025). *ATCAT: Astronomical Timeseries CAusal Transformer*. arXiv:2511.00614.
|
| 15 |
+
|
| 16 |
+
```bibtex
|
| 17 |
+
@article{tung2025atcat,
|
| 18 |
+
author = {Tung, Zora},
|
| 19 |
+
title = {{ATCAT}: Astronomical Timeseries CAusal Transformer},
|
| 20 |
+
journal = {arXiv preprint arXiv:2511.00614},
|
| 21 |
+
year = {2025}
|
| 22 |
+
}
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
## Original code
|
| 26 |
+
|
| 27 |
+
<https://codeberg.org/zorat/atcat> (git submodule at `models/atcat/code/`)
|
| 28 |
+
|
| 29 |
+
## License
|
| 30 |
+
|
| 31 |
+
ATCAT is distributed upstream under a modified MIT license with a non-military-use restriction.
|
| 32 |
+
See [LICENSE](LICENSE) and the upstream `README.md` for the exact terms.
|
| 33 |
+
|
| 34 |
+
## Model overview
|
| 35 |
+
|
| 36 |
+
This integration exports the upstream ATCAT light-curve-only ELAsTiCC classifier as an ONNX embedding model. ATCAT is a causal transformer for irregularly sampled astronomical time series. The exported wrapper uses the real upstream light-curve embedder and transformer stack from the `lc_only(split=0)` checkpoint, and exposes hidden representations before the final classifier head.
|
| 37 |
+
|
| 38 |
+
The current export targets the upstream LC-only core model (`results/elasticc/CORE/lc_only_cv_0`). The LC+metadata variant is intentionally not wrapped yet because the upstream README notes that the saved metadata preprocessing artifacts are incomplete for out-of-the-box reuse.
|
| 39 |
+
|
| 40 |
+
## Inputs
|
| 41 |
+
|
| 42 |
+
| Tensor | Shape | Description |
|
| 43 |
+
|--------|-------|-------------|
|
| 44 |
+
| `flux` | `[batch, 243]` | Padded calibrated flux values |
|
| 45 |
+
| `flux_err` | `[batch, 243]` | Padded flux uncertainties |
|
| 46 |
+
| `time` | `[batch, 243]` | Padded observation times |
|
| 47 |
+
| `mask` | `[batch, 243]` | `1` for valid points, `0` for padding |
|
| 48 |
+
| `channel_index` | `[batch, 243]` | LSST band indices in ATCAT order: `u=0, g=1, r=2, i=3, z=4, Y=5` |
|
| 49 |
+
|
| 50 |
+
## Outputs (ONNX)
|
| 51 |
+
|
| 52 |
+
| File | Shape | Aggregation |
|
| 53 |
+
|------|-------|-------------|
|
| 54 |
+
| `atcat_token.onnx` | `[batch, 384]` | Hidden state used by the upstream classifier (last valid token) |
|
| 55 |
+
| `atcat_mean.onnx` | `[batch, 384]` | Masked mean pool of transformer outputs |
|
| 56 |
+
| `atcat_full.onnx` | `[batch, 243, 384]` | Full padded transformer output sequence |
|
| 57 |
+
|
| 58 |
+
## Preprocessing steps
|
| 59 |
+
|
| 60 |
+
1. Use the upstream ATCAT ELAsTiCC-derived Parquet data format or convert your data into the same padded-per-object sequence fields.
|
| 61 |
+
2. Keep sequence order chronological as expected by the upstream preprocessing.
|
| 62 |
+
3. Pad sequences to length 243 and set `mask=0` for padding positions.
|
| 63 |
+
4. Encode LSST bands as `u, g, r, i, z, Y -> 0, 1, 2, 3, 4, 5`.
|
| 64 |
+
|
| 65 |
+
## Weights
|
| 66 |
+
|
| 67 |
+
Source: Google Drive archive linked from the upstream ATCAT README (`atcat_derived_data.tar`)
|
| 68 |
+
|
| 69 |
+
Model path used by this wrapper:
|
| 70 |
+
`results/elasticc/CORE/lc_only_cv_0/checkpoints/model_40000.pt`
|
| 71 |
+
|
| 72 |
+
Dataset used by this wrapper:
|
| 73 |
+
`data_parquet/split_0/test_*.parquet`
|