--- tags: - astronomy - time-series - light-curves - variable-stars - onnx library_name: onnx license: cc-by-4.0 --- # AstroM3 (photo encoder) **HuggingFace:** [light-curve/astrom3](https://huggingface.co/light-curve/astrom3) ## Paper Rizhko, M. et al. (2024). *AstroM³: A self-supervised multimodal model for astronomy*. arXiv:2411.08842. ```bibtex @article{rizhko2024astrom3, author = {Rizhko, Mariia and Bloom, Joshua S.}, title = {{AstroM³}: A self-supervised multimodal model for astronomy}, journal = {arXiv preprint arXiv:2411.08842}, year = {2024} } ``` ## Original code (git submodule at `models/astrom3/code/`) ## License - **Code** (this repository): MIT — see [LICENSE](LICENSE). - **Model weights** (`AstroMLCore/AstroM3-CLIP-photo`): Creative Commons Attribution 4.0 (CC BY 4.0). ## Model overview AstroM3 is a self-supervised multimodal contrastive model for variable-star classification that jointly trains photometry (light-curve), spectra, and metadata encoders using a CLIP-style objective. This integration exports the **photo-only encoder** from the pretrained CLIP checkpoint (`AstroMLCore/AstroM3-CLIP-photo`) as an ONNX embedding model. The photo encoder is an [Informer](https://ojs.aaai.org/index.php/AAAI/article/view/17325/17132) transformer (ProbSparse attention, 8 layers, d_model=128) trained on ZTF variable-star light curves from the MACC dataset. For ONNX export, the ProbSparse attention layers are replaced with standard scaled dot-product attention, which is equivalent in expectation and fully ONNX-exportable. ## Inputs | Tensor | Shape | Description | |--------|-------|-------------| | `x_enc` | `[batch, 200, 9]` | Padded photometry features (9 channels per timestep — see preprocessing) | | `mask` | `[batch, 200]` | `1` for valid timesteps, `0` for padding | ## Outputs (ONNX) Single file `astrom3.onnx` with two named outputs: | Output | Shape | Aggregation | |--------|-------|-------------| | `mean` | `[batch, 128]` | Masked mean pool of encoder outputs | | `sequence` | `[batch, 200, 128]` | Full per-timestep encoder outputs (unmasked) | ## Preprocessing steps The 9 input channels per timestep are built by `preprocess_lc()` in the upstream dataset (`AstroMLCore/AstroM3Dataset`): | Index | Feature | How obtained | |-------|---------|--------------| | 0 | `time` (HJD scaled to [0, 1]) | per-observation | | 1 | `flux` = `(flux − mean) / MAD` | per-observation | | 2 | `flux_err` = `flux_err / MAD` | per-observation | | 3 | `amplitude` | **ASAS-SN catalog scalar, replicated to every timestep** | | 4 | `period` | **ASAS-SN catalog scalar, replicated** | | 5 | `lksl_statistic` (Lafler-Kinman string length) | **ASAS-SN catalog scalar, replicated** | | 6 | `rfr_score` (Random Forest Regressor R² for phase-folded LC) | **ASAS-SN catalog scalar, replicated** | | 7 | `log10(MAD_flux)` | global scalar computed from LC, replicated | | 8 | `delta_t` = `(max_HJD − min_HJD) / 365` | global scalar computed from LC, replicated | Features 3–6 come directly from the ASAS-SN v-band variable-star catalog (Jayasinghe et al. 2019) and are **not recomputed** from the light curve by this codebase. Users applying this model to non-ASAS-SN data must provide equivalent values (e.g. run a Lomb-Scargle period finder and compute peak-to-peak amplitude themselves). Preprocessing recipe for a single light curve: 1. Deduplicate and sort observations by HJD. 2. Compute `mean` and `MAD` of the flux column; normalize flux and flux_err. 3. Scale HJD to [0, 1] over the span of the light curve. 4. Compute `log10(MAD_flux)` and `delta_t = (max_HJD − min_HJD) / 365`. 5. Obtain `amplitude`, `period`, `lksl_statistic`, `rfr_score` from the ASAS-SN catalog (or compute equivalents). 6. Tile the 6 global scalars across all timesteps; concatenate with columns 0–2 to produce an `(N, 9)` array. 7. Pad or center-crop to 200 timesteps; set `mask = 0` for padded positions. 8. Use `float32` for all tensors. ## Weights Source: The `model.safetensors` file is a standalone Informer checkpoint (classification head present but unused; loaded with `strict=False`). Dataset: ASAS-SN v-band variable-star light curves (`AstroMLCore/AstroM3Processed`). ## Applying the model without ASAS-SN catalog features Features 3–6 require the ASAS-SN catalog. For users applying the model to other surveys, we measured the sensitivity of the mean embedding to each feature being replaced. `rfr_score` was studied in detail. ### rfr_score substitution `rfr_score` is the R² of a Random Forest Regressor fit to the phase-folded light curve; it quantifies period quality (Jayasinghe et al. 2019, MNRAS 486 1907, §5; arXiv:1809.07329). In the ASAS-SN test set it ranges from −3.5 to 1.18 (median ≈ 0.38). Setting all timesteps to the constant **0.392** (the empirical optimum, equal to the dataset median) minimises mean cosine distance from the true-feature embeddings: | Metric | Value | |--------|-------| | Overall mean cosine distance | 0.049 ± 0.091 | | Macro-average per class | 0.049 ± 0.058 | Per-class breakdown (5 samples per class from the ASAS-SN test split): | Class | Mean dist | Std | True rfr mean | |-------|-----------|-----|---------------| | EW | 0.005 | 0.005 | −0.07 | | SR | 0.004 | 0.003 | +0.50 | | EA | 0.060 | 0.032 | +0.95 | | RRAB | 0.020 | 0.011 | +0.83 | | EB | 0.016 | 0.011 | +0.90 | | ROT | 0.002 | 0.002 | +0.85 | | RRC | 0.147 | 0.115 | −0.79 | | HADS | 0.016 | 0.011 | +0.59 | | M | 0.050 | 0.020 | +0.18 | | DSCT | 0.170 | 0.182 | −0.86 | Classes whose true rfr mean is far from 0.39 (RRC, DSCT) are most affected. Using an out-of-range value (e.g. ±100) causes cosine distances ~0.93–0.97, so staying within the training distribution is important.