File size: 5,910 Bytes
368ab01 bea61da 368ab01 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | ---
tags:
- astronomy
- time-series
- light-curves
- variable-stars
- onnx
library_name: onnx
license: cc-by-4.0
---
# AstroM3 (photo encoder)
**HuggingFace:** [light-curve/astrom3](https://huggingface.co/light-curve/astrom3)
## Paper
Rizhko, M. et al. (2024). *AstroM³: A self-supervised multimodal model for astronomy*. arXiv:2411.08842.
```bibtex
@article{rizhko2024astrom3,
author = {Rizhko, Mariia and Bloom, Joshua S.},
title = {{AstroM³}: A self-supervised multimodal model for astronomy},
journal = {arXiv preprint arXiv:2411.08842},
year = {2024}
}
```
## Original code
<https://github.com/MeriDK/AstroM3> (git submodule at `models/astrom3/code/`)
## License
- **Code** (this repository): MIT — see [LICENSE](LICENSE).
- **Model weights** (`AstroMLCore/AstroM3-CLIP-photo`): Creative Commons Attribution 4.0 (CC BY 4.0).
## Model overview
AstroM3 is a self-supervised multimodal contrastive model for variable-star classification that jointly trains photometry (light-curve), spectra, and metadata encoders using a CLIP-style objective. This integration exports the **photo-only encoder** from the pretrained CLIP checkpoint (`AstroMLCore/AstroM3-CLIP-photo`) as an ONNX embedding model.
The photo encoder is an [Informer](https://ojs.aaai.org/index.php/AAAI/article/view/17325/17132) transformer (ProbSparse attention, 8 layers, d_model=128) trained on ZTF variable-star light curves from the MACC dataset. For ONNX export, the ProbSparse attention layers are replaced with standard scaled dot-product attention, which is equivalent in expectation and fully ONNX-exportable.
## Inputs
| Tensor | Shape | Description |
|--------|-------|-------------|
| `x_enc` | `[batch, 200, 9]` | Padded photometry features (9 channels per timestep — see preprocessing) |
| `mask` | `[batch, 200]` | `1` for valid timesteps, `0` for padding |
## Outputs (ONNX)
Single file `astrom3.onnx` with two named outputs:
| Output | Shape | Aggregation |
|--------|-------|-------------|
| `mean` | `[batch, 128]` | Masked mean pool of encoder outputs |
| `sequence` | `[batch, 200, 128]` | Full per-timestep encoder outputs (unmasked) |
## Preprocessing steps
The 9 input channels per timestep are built by `preprocess_lc()` in the
upstream dataset (`AstroMLCore/AstroM3Dataset`):
| Index | Feature | How obtained |
|-------|---------|--------------|
| 0 | `time` (HJD scaled to [0, 1]) | per-observation |
| 1 | `flux` = `(flux − mean) / MAD` | per-observation |
| 2 | `flux_err` = `flux_err / MAD` | per-observation |
| 3 | `amplitude` | **ASAS-SN catalog scalar, replicated to every timestep** |
| 4 | `period` | **ASAS-SN catalog scalar, replicated** |
| 5 | `lksl_statistic` (Lafler-Kinman string length) | **ASAS-SN catalog scalar, replicated** |
| 6 | `rfr_score` (Random Forest Regressor R² for phase-folded LC) | **ASAS-SN catalog scalar, replicated** |
| 7 | `log10(MAD_flux)` | global scalar computed from LC, replicated |
| 8 | `delta_t` = `(max_HJD − min_HJD) / 365` | global scalar computed from LC, replicated |
Features 3–6 come directly from the ASAS-SN v-band variable-star catalog
(Jayasinghe et al. 2019) and are **not recomputed** from the light curve by
this codebase. Users applying this model to non-ASAS-SN data must provide
equivalent values (e.g. run a Lomb-Scargle period finder and compute
peak-to-peak amplitude themselves).
Preprocessing recipe for a single light curve:
1. Deduplicate and sort observations by HJD.
2. Compute `mean` and `MAD` of the flux column; normalize flux and flux_err.
3. Scale HJD to [0, 1] over the span of the light curve.
4. Compute `log10(MAD_flux)` and `delta_t = (max_HJD − min_HJD) / 365`.
5. Obtain `amplitude`, `period`, `lksl_statistic`, `rfr_score` from the
ASAS-SN catalog (or compute equivalents).
6. Tile the 6 global scalars across all timesteps; concatenate with columns
0–2 to produce an `(N, 9)` array.
7. Pad or center-crop to 200 timesteps; set `mask = 0` for padded positions.
8. Use `float32` for all tensors.
## Weights
Source: <https://huggingface.co/AstroMLCore/AstroM3-CLIP-photo>
The `model.safetensors` file is a standalone Informer checkpoint (classification head present but unused; loaded with `strict=False`).
Dataset: ASAS-SN v-band variable-star light curves (`AstroMLCore/AstroM3Processed`).
## Applying the model without ASAS-SN catalog features
Features 3–6 require the ASAS-SN catalog. For users applying the model to
other surveys, we measured the sensitivity of the mean embedding to each
feature being replaced. `rfr_score` was studied in detail.
### rfr_score substitution
`rfr_score` is the R² of a Random Forest Regressor fit to the phase-folded
light curve; it quantifies period quality
(Jayasinghe et al. 2019, MNRAS 486 1907, §5; arXiv:1809.07329).
In the ASAS-SN test set it ranges from −3.5 to 1.18 (median ≈ 0.38).
Setting all timesteps to the constant **0.392** (the empirical optimum,
equal to the dataset median) minimises mean cosine distance from the
true-feature embeddings:
| Metric | Value |
|--------|-------|
| Overall mean cosine distance | 0.049 ± 0.091 |
| Macro-average per class | 0.049 ± 0.058 |
Per-class breakdown (5 samples per class from the ASAS-SN test split):
| Class | Mean dist | Std | True rfr mean |
|-------|-----------|-----|---------------|
| EW | 0.005 | 0.005 | −0.07 |
| SR | 0.004 | 0.003 | +0.50 |
| EA | 0.060 | 0.032 | +0.95 |
| RRAB | 0.020 | 0.011 | +0.83 |
| EB | 0.016 | 0.011 | +0.90 |
| ROT | 0.002 | 0.002 | +0.85 |
| RRC | 0.147 | 0.115 | −0.79 |
| HADS | 0.016 | 0.011 | +0.59 |
| M | 0.050 | 0.020 | +0.18 |
| DSCT | 0.170 | 0.182 | −0.86 |
Classes whose true rfr mean is far from 0.39 (RRC, DSCT) are most affected.
Using an out-of-range value (e.g. ±100) causes cosine distances ~0.93–0.97,
so staying within the training distribution is important.
|