---
license: mit
tags:
  - astronomy
  - time-series
  - light-curves
  - onnx
library_name: onnx
---

# Astromer 2

**HuggingFace:** [light-curve/astromer2](https://huggingface.co/light-curve/astromer2)

## Paper

Donoso-Oliva, C., Becker, I., Protopapas, P., Cabrera-Vives, G., Cádiz-Leyton, M., & Moreno-Cartagena, D. (2026). *Generalizing across astronomical surveys: Few-shot light curve classification with Astromer 2*. Astronomy & Astrophysics (in press).

```bibtex
@article{astromer2,
  author  = {Donoso-Oliva, C. and Becker, I. and Protopapas, P. and
             Cabrera-Vives, G. and C{\'a}diz-Leyton, M. and Moreno-Cartagena, D.},
  title   = {Generalizing across astronomical surveys: Few-shot light curve
             classification with {Astromer} 2},
  journal = {Astronomy \& Astrophysics},
  year    = {2026},
  note    = {In press},
}
```

## Original code

<https://github.com/astromer-science/main-code> (git submodule at `models/astromer2/code/`)

## License

MIT — see [LICENSE](LICENSE).

## Model overview

Astromer 2 is a BERT-inspired transformer encoder pretrained on 1.5 million MACHO light curves via masked magnitude prediction. The encoder processes irregularly-sampled photometric time series (time, magnitude) using MJD-aware positional encoding and a trainable mask token. It produces per-timestep contextual embeddings that can be aggregated into a fixed-size representation for downstream tasks such as few-shot classification.

Default configuration: 6 attention blocks, 4 heads, head dimension 64 (d_model = 256), sequence length 200, embedding dimension 256.

## Input data format

Raw light curves are pairs `(time, mag)`:
- `time` — observation time in days. Need not be absolute MJD; any consistent time axis in days works because the pipeline subtracts the per-window mean before the encoder sees it. The pretrained weights were produced from MACHO data with MJD ~48800–51700.
- `mag` — magnitude. MACHO instrumental magnitudes are typically negative (e.g. −10 to −3); the pipeline is not restricted to that range.

Photometric errors are **not used** at inference. The upstream preprocessing code expects a 3-column `[time, mag, err]` array internally, but errors only appear in the pretraining reconstruction-loss weights (`outputs['w_error']`), which are never passed to the encoder. Pass dummy zeros if you run the pipeline directly.

## Preprocessing steps

All steps are implemented in `code/src/data/loaders.py` (`get_loader`) and `code/src/data/preprocessing.py`.

### Step 1 — Windowing

The upstream code supports two windowing strategies via the `sampling` flag of `to_windows`:

- **`sampling=True` — random window** (used during pretraining): a single contiguous window of 200 observations is drawn at a uniformly random starting position. Light curves shorter than 200 observations are used in full.
- **`sampling=False` — sequential windows** (used for test-data generation): the light curve is divided into sequential, non-overlapping windows of 200 observations. A light curve of length *L* yields ⌊*L*/200⌋ + 1 windows; the last window may be shorter than 200 and is padded in step 3. Light curves shorter than 200 observations produce a single window. When a light curve produces multiple windows, each window yields a separate embedding vector; to obtain a single per-light-curve embedding, average the per-window embeddings.

Test-data is generated with `sampling=False`.

Source: `src/data/preprocessing.py:to_windows`.

### Step 2 — Zero-mean normalization

Subtract the per-window column mean from each column:

```
x_norm = x - mean(x, axis=0)   # x has shape [n_obs, 3]; columns: time, mag, err
```

After this step `times` = time − mean(time) and `input` = mag − mean(mag) are centred around zero.

Source: `src/data/preprocessing.py:standardize`.

### Step 3 — Padding and mask construction

Right-pad the normalised sequence to exactly 200 time steps with zeros. Construct `mask_in`:

```
mask_in[i] = 0   for i < n_obs   (real observation — visible to encoder)
mask_in[i] = 1   for i >= n_obs  (padding — hidden from encoder)
```

> **Note on mask convention:** the internal pipeline uses `mask_in=0` for visible positions and `mask_in=1` for padding/hidden positions. This is the opposite of the ONNX interface (see below).

Source: `src/data/masking.py:mask_sample`, padding block at the end.

### Step 4 — Format encoder inputs

Extract the two encoder inputs from the normalised, padded array:

| Tensor | Source | Shape |
|--------|--------|-------|
| `input` | normalised magnitude column | `[batch, 200, 1]` |
| `times` | normalised time column | `[batch, 200, 1]` |
| `mask_in` | constructed in step 3 | `[batch, 200, 1]` |

The normalised error column is **not** fed to the encoder. Errors appear only in the pretraining reconstruction loss.

Source: `src/data/loaders.py:format_inp_astromer` (`aversion='base'`).

## Inputs (ONNX)

The exported ONNX models use a **user-friendly mask convention** that is the inverse of the internal pipeline:

| Tensor | Shape | Description |
|--------|-------|-------------|
| `input` | `[batch, 200, 1]` | `mag − mean(mag)` over the window (step 2 above) |
| `times` | `[batch, 200, 1]` | `time − mean(time)` over the window (step 2 above) |
| `mask_in` | `[batch, 200, 1]` | **1 = valid observation, 0 = padding** |

The ONNX wrapper inverts `mask_in` internally before passing it to the encoder, so consumers can use the intuitive convention.

## Outputs (ONNX)

Single file `astromer2.onnx` with three named outputs:

| Output name | Shape | Aggregation |
|-------------|-------|-------------|
| `mean` | `[batch, 256]` | Masked mean pooling: `sum(z * mask_in) / sum(mask_in)` |
| `max`  | `[batch, 256]` | Masked max pooling over valid timesteps |
| `sequence` | `[batch, 200, 256]` | Per-timestep features |

Request only the output(s) you need via `session.run(["mean"], feed)` — onnxruntime will prune unused computation.

ONNX opset: 13.

## Weights

Source: [Zenodo record 18207945](https://zenodo.org/records/18207945)
Training dataset: MACHO (1.5 million light curves, V and R bands)
Checkpoint: `astromer_v2/macho/`

The test-data parquet file was generated with these MACHO weights and `sampling=False` (sequential windows).