ONNX
astronomy
time-series
light-curves
variable-stars
File size: 5,910 Bytes
368ab01
 
 
 
 
 
 
 
 
 
 
 
 
bea61da
 
368ab01
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
tags:
  - astronomy
  - time-series
  - light-curves
  - variable-stars
  - onnx
library_name: onnx
license: cc-by-4.0
---

# AstroM3 (photo encoder)

**HuggingFace:** [light-curve/astrom3](https://huggingface.co/light-curve/astrom3)

## Paper

Rizhko, M. et al. (2024). *AstroM³: A self-supervised multimodal model for astronomy*. arXiv:2411.08842.

```bibtex
@article{rizhko2024astrom3,
  author = {Rizhko, Mariia and Bloom, Joshua S.},
  title = {{AstroM³}: A self-supervised multimodal model for astronomy},
  journal = {arXiv preprint arXiv:2411.08842},
  year = {2024}
}
```

## Original code

<https://github.com/MeriDK/AstroM3> (git submodule at `models/astrom3/code/`)

## License

- **Code** (this repository): MIT — see [LICENSE](LICENSE).
- **Model weights** (`AstroMLCore/AstroM3-CLIP-photo`): Creative Commons Attribution 4.0 (CC BY 4.0).

## Model overview

AstroM3 is a self-supervised multimodal contrastive model for variable-star classification that jointly trains photometry (light-curve), spectra, and metadata encoders using a CLIP-style objective. This integration exports the **photo-only encoder** from the pretrained CLIP checkpoint (`AstroMLCore/AstroM3-CLIP-photo`) as an ONNX embedding model.

The photo encoder is an [Informer](https://ojs.aaai.org/index.php/AAAI/article/view/17325/17132) transformer (ProbSparse attention, 8 layers, d_model=128) trained on ZTF variable-star light curves from the MACC dataset. For ONNX export, the ProbSparse attention layers are replaced with standard scaled dot-product attention, which is equivalent in expectation and fully ONNX-exportable.

## Inputs

| Tensor | Shape | Description |
|--------|-------|-------------|
| `x_enc` | `[batch, 200, 9]` | Padded photometry features (9 channels per timestep — see preprocessing) |
| `mask` | `[batch, 200]` | `1` for valid timesteps, `0` for padding |

## Outputs (ONNX)

Single file `astrom3.onnx` with two named outputs:

| Output | Shape | Aggregation |
|--------|-------|-------------|
| `mean` | `[batch, 128]` | Masked mean pool of encoder outputs |
| `sequence` | `[batch, 200, 128]` | Full per-timestep encoder outputs (unmasked) |

## Preprocessing steps

The 9 input channels per timestep are built by `preprocess_lc()` in the
upstream dataset (`AstroMLCore/AstroM3Dataset`):

| Index | Feature | How obtained |
|-------|---------|--------------|
| 0 | `time` (HJD scaled to [0, 1]) | per-observation |
| 1 | `flux` = `(flux − mean) / MAD` | per-observation |
| 2 | `flux_err` = `flux_err / MAD` | per-observation |
| 3 | `amplitude` | **ASAS-SN catalog scalar, replicated to every timestep** |
| 4 | `period` | **ASAS-SN catalog scalar, replicated** |
| 5 | `lksl_statistic` (Lafler-Kinman string length) | **ASAS-SN catalog scalar, replicated** |
| 6 | `rfr_score` (Random Forest Regressor R² for phase-folded LC) | **ASAS-SN catalog scalar, replicated** |
| 7 | `log10(MAD_flux)` | global scalar computed from LC, replicated |
| 8 | `delta_t` = `(max_HJD − min_HJD) / 365` | global scalar computed from LC, replicated |

Features 3–6 come directly from the ASAS-SN v-band variable-star catalog
(Jayasinghe et al. 2019) and are **not recomputed** from the light curve by
this codebase. Users applying this model to non-ASAS-SN data must provide
equivalent values (e.g. run a Lomb-Scargle period finder and compute
peak-to-peak amplitude themselves).

Preprocessing recipe for a single light curve:

1. Deduplicate and sort observations by HJD.
2. Compute `mean` and `MAD` of the flux column; normalize flux and flux_err.
3. Scale HJD to [0, 1] over the span of the light curve.
4. Compute `log10(MAD_flux)` and `delta_t = (max_HJD − min_HJD) / 365`.
5. Obtain `amplitude`, `period`, `lksl_statistic`, `rfr_score` from the
   ASAS-SN catalog (or compute equivalents).
6. Tile the 6 global scalars across all timesteps; concatenate with columns
   0–2 to produce an `(N, 9)` array.
7. Pad or center-crop to 200 timesteps; set `mask = 0` for padded positions.
8. Use `float32` for all tensors.

## Weights

Source: <https://huggingface.co/AstroMLCore/AstroM3-CLIP-photo>

The `model.safetensors` file is a standalone Informer checkpoint (classification head present but unused; loaded with `strict=False`).

Dataset: ASAS-SN v-band variable-star light curves (`AstroMLCore/AstroM3Processed`).

## Applying the model without ASAS-SN catalog features

Features 3–6 require the ASAS-SN catalog. For users applying the model to
other surveys, we measured the sensitivity of the mean embedding to each
feature being replaced. `rfr_score` was studied in detail.

### rfr_score substitution

`rfr_score` is the R² of a Random Forest Regressor fit to the phase-folded
light curve; it quantifies period quality
(Jayasinghe et al. 2019, MNRAS 486 1907, §5; arXiv:1809.07329).
In the ASAS-SN test set it ranges from −3.5 to 1.18 (median ≈ 0.38).

Setting all timesteps to the constant **0.392** (the empirical optimum,
equal to the dataset median) minimises mean cosine distance from the
true-feature embeddings:

| Metric | Value |
|--------|-------|
| Overall mean cosine distance | 0.049 ± 0.091 |
| Macro-average per class | 0.049 ± 0.058 |

Per-class breakdown (5 samples per class from the ASAS-SN test split):

| Class | Mean dist | Std | True rfr mean |
|-------|-----------|-----|---------------|
| EW    | 0.005 | 0.005 | −0.07 |
| SR    | 0.004 | 0.003 | +0.50 |
| EA    | 0.060 | 0.032 | +0.95 |
| RRAB  | 0.020 | 0.011 | +0.83 |
| EB    | 0.016 | 0.011 | +0.90 |
| ROT   | 0.002 | 0.002 | +0.85 |
| RRC   | 0.147 | 0.115 | −0.79 |
| HADS  | 0.016 | 0.011 | +0.59 |
| M     | 0.050 | 0.020 | +0.18 |
| DSCT  | 0.170 | 0.182 | −0.86 |

Classes whose true rfr mean is far from 0.39 (RRC, DSCT) are most affected.
Using an out-of-range value (e.g. ±100) causes cosine distances ~0.93–0.97,
so staying within the training distribution is important.