File size: 5,910 Bytes

---
tags:
  - astronomy
  - time-series
  - light-curves
  - variable-stars
  - onnx
library_name: onnx
license: cc-by-4.0
---

# AstroM3 (photo encoder)

**HuggingFace:** [light-curve/astrom3](https://huggingface.co/light-curve/astrom3)

## Paper

Rizhko, M. et al. (2024). *AstroM³: A self-supervised multimodal model for astronomy*. arXiv:2411.08842.

```bibtex
@article{rizhko2024astrom3,
  author = {Rizhko, Mariia and Bloom, Joshua S.},
  title = {{AstroM³}: A self-supervised multimodal model for astronomy},
  journal = {arXiv preprint arXiv:2411.08842},
  year = {2024}
}
```

## Original code

<https://github.com/MeriDK/AstroM3> (git submodule at `models/astrom3/code/`)

## License

- **Code** (this repository): MIT — see [LICENSE](LICENSE).
- **Model weights** (`AstroMLCore/AstroM3-CLIP-photo`): Creative Commons Attribution 4.0 (CC BY 4.0).

## Model overview

AstroM3 is a self-supervised multimodal contrastive model for variable-star classification that jointly trains photometry (light-curve), spectra, and metadata encoders using a CLIP-style objective. This integration exports the **photo-only encoder** from the pretrained CLIP checkpoint (`AstroMLCore/AstroM3-CLIP-photo`) as an ONNX embedding model.

The photo encoder is an [Informer](https://ojs.aaai.org/index.php/AAAI/article/view/17325/17132) transformer (ProbSparse attention, 8 layers, d_model=128) trained on ZTF variable-star light curves from the MACC dataset. For ONNX export, the ProbSparse attention layers are replaced with standard scaled dot-product attention, which is equivalent in expectation and fully ONNX-exportable.

## Inputs

| Tensor | Shape | Description |
|--------|-------|-------------|
| `x_enc` | `[batch, 200, 9]` | Padded photometry features (9 channels per timestep — see preprocessing) |
| `mask` | `[batch, 200]` | `1` for valid timesteps, `0` for padding |

## Outputs (ONNX)

Single file `astrom3.onnx` with two named outputs:

| Output | Shape | Aggregation |
|--------|-------|-------------|
| `mean` | `[batch, 128]` | Masked mean pool of encoder outputs |
| `sequence` | `[batch, 200, 128]` | Full per-timestep encoder outputs (unmasked) |

## Preprocessing steps

The 9 input channels per timestep are built by `preprocess_lc()` in the
upstream dataset (`AstroMLCore/AstroM3Dataset`):

| Index | Feature | How obtained |
|-------|---------|--------------|
| 0 | `time` (HJD scaled to [0, 1]) | per-observation |
| 1 | `flux` = `(flux − mean) / MAD` | per-observation |
| 2 | `flux_err` = `flux_err / MAD` | per-observation |
| 3 | `amplitude` | **ASAS-SN catalog scalar, replicated to every timestep** |
| 4 | `period` | **ASAS-SN catalog scalar, replicated** |
| 5 | `lksl_statistic` (Lafler-Kinman string length) | **ASAS-SN catalog scalar, replicated** |
| 6 | `rfr_score` (Random Forest Regressor R² for phase-folded LC) | **ASAS-SN catalog scalar, replicated** |
| 7 | `log10(MAD_flux)` | global scalar computed from LC, replicated |
| 8 | `delta_t` = `(max_HJD − min_HJD) / 365` | global scalar computed from LC, replicated |

Features 3–6 come directly from the ASAS-SN v-band variable-star catalog
(Jayasinghe et al. 2019) and are **not recomputed** from the light curve by
this codebase. Users applying this model to non-ASAS-SN data must provide
equivalent values (e.g. run a Lomb-Scargle period finder and compute
peak-to-peak amplitude themselves).

Preprocessing recipe for a single light curve:

1. Deduplicate and sort observations by HJD.
2. Compute `mean` and `MAD` of the flux column; normalize flux and flux_err.
3. Scale HJD to [0, 1] over the span of the light curve.
4. Compute `log10(MAD_flux)` and `delta_t = (max_HJD − min_HJD) / 365`.
5. Obtain `amplitude`, `period`, `lksl_statistic`, `rfr_score` from the
   ASAS-SN catalog (or compute equivalents).
6. Tile the 6 global scalars across all timesteps; concatenate with columns
   0–2 to produce an `(N, 9)` array.
7. Pad or center-crop to 200 timesteps; set `mask = 0` for padded positions.
8. Use `float32` for all tensors.

## Weights

Source: <https://huggingface.co/AstroMLCore/AstroM3-CLIP-photo>

The `model.safetensors` file is a standalone Informer checkpoint (classification head present but unused; loaded with `strict=False`).

Dataset: ASAS-SN v-band variable-star light curves (`AstroMLCore/AstroM3Processed`).

## Applying the model without ASAS-SN catalog features

Features 3–6 require the ASAS-SN catalog. For users applying the model to
other surveys, we measured the sensitivity of the mean embedding to each
feature being replaced. `rfr_score` was studied in detail.

### rfr_score substitution

`rfr_score` is the R² of a Random Forest Regressor fit to the phase-folded
light curve; it quantifies period quality
(Jayasinghe et al. 2019, MNRAS 486 1907, §5; arXiv:1809.07329).
In the ASAS-SN test set it ranges from −3.5 to 1.18 (median ≈ 0.38).

Setting all timesteps to the constant **0.392** (the empirical optimum,
equal to the dataset median) minimises mean cosine distance from the
true-feature embeddings:

| Metric | Value |
|--------|-------|
| Overall mean cosine distance | 0.049 ± 0.091 |
| Macro-average per class | 0.049 ± 0.058 |

Per-class breakdown (5 samples per class from the ASAS-SN test split):

| Class | Mean dist | Std | True rfr mean |
|-------|-----------|-----|---------------|
| EW    | 0.005 | 0.005 | −0.07 |
| SR    | 0.004 | 0.003 | +0.50 |
| EA    | 0.060 | 0.032 | +0.95 |
| RRAB  | 0.020 | 0.011 | +0.83 |
| EB    | 0.016 | 0.011 | +0.90 |
| ROT   | 0.002 | 0.002 | +0.85 |
| RRC   | 0.147 | 0.115 | −0.79 |
| HADS  | 0.016 | 0.011 | +0.59 |
| M     | 0.050 | 0.020 | +0.18 |
| DSCT  | 0.170 | 0.182 | −0.86 |

Classes whose true rfr mean is far from 0.39 (RRC, DSCT) are most affected.
Using an out-of-range value (e.g. ±100) causes cosine distances ~0.93–0.97,
so staying within the training distribution is important.