| --- |
| license: mit |
| tags: |
| - astronomy |
| - time-series |
| - light-curves |
| - onnx |
| library_name: onnx |
| --- |
| |
| # Astromer 2 |
|
|
| **HuggingFace:** [light-curve/astromer2](https://huggingface.co/light-curve/astromer2) |
|
|
| ## Paper |
|
|
| Donoso-Oliva, C., Becker, I., Protopapas, P., Cabrera-Vives, G., CΓ‘diz-Leyton, M., & Moreno-Cartagena, D. (2026). *Generalizing across astronomical surveys: Few-shot light curve classification with Astromer 2*. Astronomy & Astrophysics (in press). |
|
|
| ```bibtex |
| @article{astromer2, |
| author = {Donoso-Oliva, C. and Becker, I. and Protopapas, P. and |
| Cabrera-Vives, G. and C{\'a}diz-Leyton, M. and Moreno-Cartagena, D.}, |
| title = {Generalizing across astronomical surveys: Few-shot light curve |
| classification with {Astromer} 2}, |
| journal = {Astronomy \& Astrophysics}, |
| year = {2026}, |
| note = {In press}, |
| } |
| ``` |
|
|
| ## Original code |
|
|
| <https://github.com/astromer-science/main-code> (git submodule at `models/astromer2/code/`) |
|
|
| ## License |
|
|
| MIT β see [LICENSE](LICENSE). |
|
|
| ## Model overview |
|
|
| Astromer 2 is a BERT-inspired transformer encoder pretrained on 1.5 million MACHO light curves via masked magnitude prediction. The encoder processes irregularly-sampled photometric time series (time, magnitude) using MJD-aware positional encoding and a trainable mask token. It produces per-timestep contextual embeddings that can be aggregated into a fixed-size representation for downstream tasks such as few-shot classification. |
|
|
| Default configuration: 6 attention blocks, 4 heads, head dimension 64 (d_model = 256), sequence length 200, embedding dimension 256. |
| |
| ## Input data format |
| |
| Raw light curves are pairs `(time, mag)`: |
| - `time` β observation time in days. Need not be absolute MJD; any consistent time axis in days works because the pipeline subtracts the per-window mean before the encoder sees it. The pretrained weights were produced from MACHO data with MJD ~48800β51700. |
| - `mag` β magnitude. MACHO instrumental magnitudes are typically negative (e.g. β10 to β3); the pipeline is not restricted to that range. |
| |
| Photometric errors are **not used** at inference. The upstream preprocessing code expects a 3-column `[time, mag, err]` array internally, but errors only appear in the pretraining reconstruction-loss weights (`outputs['w_error']`), which are never passed to the encoder. Pass dummy zeros if you run the pipeline directly. |
|
|
| ## Preprocessing steps |
|
|
| All steps are implemented in `code/src/data/loaders.py` (`get_loader`) and `code/src/data/preprocessing.py`. |
|
|
| ### Step 1 β Windowing |
|
|
| The upstream code supports two windowing strategies via the `sampling` flag of `to_windows`: |
|
|
| - **`sampling=True` β random window** (used during pretraining): a single contiguous window of 200 observations is drawn at a uniformly random starting position. Light curves shorter than 200 observations are used in full. |
| - **`sampling=False` β sequential windows** (used for test-data generation): the light curve is divided into sequential, non-overlapping windows of 200 observations. A light curve of length *L* yields β*L*/200β + 1 windows; the last window may be shorter than 200 and is padded in step 3. Light curves shorter than 200 observations produce a single window. When a light curve produces multiple windows, each window yields a separate embedding vector; to obtain a single per-light-curve embedding, average the per-window embeddings. |
|
|
| Test-data is generated with `sampling=False`. |
|
|
| Source: `src/data/preprocessing.py:to_windows`. |
|
|
| ### Step 2 β Zero-mean normalization |
|
|
| Subtract the per-window column mean from each column: |
|
|
| ``` |
| x_norm = x - mean(x, axis=0) # x has shape [n_obs, 3]; columns: time, mag, err |
| ``` |
|
|
| After this step `times` = time β mean(time) and `input` = mag β mean(mag) are centred around zero. |
|
|
| Source: `src/data/preprocessing.py:standardize`. |
|
|
| ### Step 3 β Padding and mask construction |
|
|
| Right-pad the normalised sequence to exactly 200 time steps with zeros. Construct `mask_in`: |
|
|
| ``` |
| mask_in[i] = 0 for i < n_obs (real observation β visible to encoder) |
| mask_in[i] = 1 for i >= n_obs (padding β hidden from encoder) |
| ``` |
|
|
| > **Note on mask convention:** the internal pipeline uses `mask_in=0` for visible positions and `mask_in=1` for padding/hidden positions. This is the opposite of the ONNX interface (see below). |
|
|
| Source: `src/data/masking.py:mask_sample`, padding block at the end. |
|
|
| ### Step 4 β Format encoder inputs |
|
|
| Extract the two encoder inputs from the normalised, padded array: |
|
|
| | Tensor | Source | Shape | |
| |--------|--------|-------| |
| | `input` | normalised magnitude column | `[batch, 200, 1]` | |
| | `times` | normalised time column | `[batch, 200, 1]` | |
| | `mask_in` | constructed in step 3 | `[batch, 200, 1]` | |
|
|
| The normalised error column is **not** fed to the encoder. Errors appear only in the pretraining reconstruction loss. |
|
|
| Source: `src/data/loaders.py:format_inp_astromer` (`aversion='base'`). |
|
|
| ## Inputs (ONNX) |
|
|
| The exported ONNX models use a **user-friendly mask convention** that is the inverse of the internal pipeline: |
|
|
| | Tensor | Shape | Description | |
| |--------|-------|-------------| |
| | `input` | `[batch, 200, 1]` | `mag β mean(mag)` over the window (step 2 above) | |
| | `times` | `[batch, 200, 1]` | `time β mean(time)` over the window (step 2 above) | |
| | `mask_in` | `[batch, 200, 1]` | **1 = valid observation, 0 = padding** | |
|
|
| The ONNX wrapper inverts `mask_in` internally before passing it to the encoder, so consumers can use the intuitive convention. |
|
|
| ## Outputs (ONNX) |
|
|
| Single file `astromer2.onnx` with three named outputs: |
|
|
| | Output name | Shape | Aggregation | |
| |-------------|-------|-------------| |
| | `mean` | `[batch, 256]` | Masked mean pooling: `sum(z * mask_in) / sum(mask_in)` | |
| | `max` | `[batch, 256]` | Masked max pooling over valid timesteps | |
| | `sequence` | `[batch, 200, 256]` | Per-timestep features | |
|
|
| Request only the output(s) you need via `session.run(["mean"], feed)` β onnxruntime will prune unused computation. |
|
|
| ONNX opset: 13. |
|
|
| ## Weights |
|
|
| Source: [Zenodo record 18207945](https://zenodo.org/records/18207945) |
| Training dataset: MACHO (1.5 million light curves, V and R bands) |
| Checkpoint: `astromer_v2/macho/` |
|
|
| The test-data parquet file was generated with these MACHO weights and `sampling=False` (sequential windows). |
|
|