hombit commited on
Commit
d677be5
·
verified ·
1 Parent(s): 972c4f5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - astronomy
4
+ - time-series
5
+ - light-curves
6
+ - onnx
7
+ library_name: onnx
8
+ ---
9
+
10
+ # ATCAT
11
+
12
+ ## Paper
13
+
14
+ Tung, Z. (2025). *ATCAT: Astronomical Timeseries CAusal Transformer*. arXiv:2511.00614.
15
+
16
+ ```bibtex
17
+ @article{tung2025atcat,
18
+ author = {Tung, Zora},
19
+ title = {{ATCAT}: Astronomical Timeseries CAusal Transformer},
20
+ journal = {arXiv preprint arXiv:2511.00614},
21
+ year = {2025}
22
+ }
23
+ ```
24
+
25
+ ## Original code
26
+
27
+ <https://codeberg.org/zorat/atcat> (git submodule at `models/atcat/code/`)
28
+
29
+ ## License
30
+
31
+ ATCAT is distributed upstream under a modified MIT license with a non-military-use restriction.
32
+ See [LICENSE](LICENSE) and the upstream `README.md` for the exact terms.
33
+
34
+ ## Model overview
35
+
36
+ This integration exports the upstream ATCAT light-curve-only ELAsTiCC classifier as an ONNX embedding model. ATCAT is a causal transformer for irregularly sampled astronomical time series. The exported wrapper uses the real upstream light-curve embedder and transformer stack from the `lc_only(split=0)` checkpoint, and exposes hidden representations before the final classifier head.
37
+
38
+ The current export targets the upstream LC-only core model (`results/elasticc/CORE/lc_only_cv_0`). The LC+metadata variant is intentionally not wrapped yet because the upstream README notes that the saved metadata preprocessing artifacts are incomplete for out-of-the-box reuse.
39
+
40
+ ## Inputs
41
+
42
+ | Tensor | Shape | Description |
43
+ |--------|-------|-------------|
44
+ | `flux` | `[batch, 243]` | Padded calibrated flux values |
45
+ | `flux_err` | `[batch, 243]` | Padded flux uncertainties |
46
+ | `time` | `[batch, 243]` | Padded observation times |
47
+ | `mask` | `[batch, 243]` | `1` for valid points, `0` for padding |
48
+ | `channel_index` | `[batch, 243]` | LSST band indices in ATCAT order: `u=0, g=1, r=2, i=3, z=4, Y=5` |
49
+
50
+ ## Outputs (ONNX)
51
+
52
+ | File | Shape | Aggregation |
53
+ |------|-------|-------------|
54
+ | `atcat_token.onnx` | `[batch, 384]` | Hidden state used by the upstream classifier (last valid token) |
55
+ | `atcat_mean.onnx` | `[batch, 384]` | Masked mean pool of transformer outputs |
56
+ | `atcat_full.onnx` | `[batch, 243, 384]` | Full padded transformer output sequence |
57
+
58
+ ## Preprocessing steps
59
+
60
+ 1. Use the upstream ATCAT ELAsTiCC-derived Parquet data format or convert your data into the same padded-per-object sequence fields.
61
+ 2. Keep sequence order chronological as expected by the upstream preprocessing.
62
+ 3. Pad sequences to length 243 and set `mask=0` for padding positions.
63
+ 4. Encode LSST bands as `u, g, r, i, z, Y -> 0, 1, 2, 3, 4, 5`.
64
+
65
+ ## Weights
66
+
67
+ Source: Google Drive archive linked from the upstream ATCAT README (`atcat_derived_data.tar`)
68
+
69
+ Model path used by this wrapper:
70
+ `results/elasticc/CORE/lc_only_cv_0/checkpoints/model_40000.pt`
71
+
72
+ Dataset used by this wrapper:
73
+ `data_parquet/split_0/test_*.parquet`