File size: 2,779 Bytes
d677be5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
tags:
  - astronomy
  - time-series
  - light-curves
  - onnx
library_name: onnx
---

# ATCAT

## Paper

Tung, Z. (2025). *ATCAT: Astronomical Timeseries CAusal Transformer*. arXiv:2511.00614.

```bibtex
@article{tung2025atcat,
  author = {Tung, Zora},
  title = {{ATCAT}: Astronomical Timeseries CAusal Transformer},
  journal = {arXiv preprint arXiv:2511.00614},
  year = {2025}
}
```

## Original code

<https://codeberg.org/zorat/atcat> (git submodule at `models/atcat/code/`)

## License

ATCAT is distributed upstream under a modified MIT license with a non-military-use restriction.
See [LICENSE](LICENSE) and the upstream `README.md` for the exact terms.

## Model overview

This integration exports the upstream ATCAT light-curve-only ELAsTiCC classifier as an ONNX embedding model. ATCAT is a causal transformer for irregularly sampled astronomical time series. The exported wrapper uses the real upstream light-curve embedder and transformer stack from the `lc_only(split=0)` checkpoint, and exposes hidden representations before the final classifier head.

The current export targets the upstream LC-only core model (`results/elasticc/CORE/lc_only_cv_0`). The LC+metadata variant is intentionally not wrapped yet because the upstream README notes that the saved metadata preprocessing artifacts are incomplete for out-of-the-box reuse.

## Inputs

| Tensor | Shape | Description |
|--------|-------|-------------|
| `flux` | `[batch, 243]` | Padded calibrated flux values |
| `flux_err` | `[batch, 243]` | Padded flux uncertainties |
| `time` | `[batch, 243]` | Padded observation times |
| `mask` | `[batch, 243]` | `1` for valid points, `0` for padding |
| `channel_index` | `[batch, 243]` | LSST band indices in ATCAT order: `u=0, g=1, r=2, i=3, z=4, Y=5` |

## Outputs (ONNX)

| File | Shape | Aggregation |
|------|-------|-------------|
| `atcat_token.onnx` | `[batch, 384]` | Hidden state used by the upstream classifier (last valid token) |
| `atcat_mean.onnx` | `[batch, 384]` | Masked mean pool of transformer outputs |
| `atcat_full.onnx` | `[batch, 243, 384]` | Full padded transformer output sequence |

## Preprocessing steps

1. Use the upstream ATCAT ELAsTiCC-derived Parquet data format or convert your data into the same padded-per-object sequence fields.
2. Keep sequence order chronological as expected by the upstream preprocessing.
3. Pad sequences to length 243 and set `mask=0` for padding positions.
4. Encode LSST bands as `u, g, r, i, z, Y -> 0, 1, 2, 3, 4, 5`.

## Weights

Source: Google Drive archive linked from the upstream ATCAT README (`atcat_derived_data.tar`)

Model path used by this wrapper:
`results/elasticc/CORE/lc_only_cv_0/checkpoints/model_40000.pt`

Dataset used by this wrapper:
`data_parquet/split_0/test_*.parquet`