File size: 6,303 Bytes
750e3a9
 
 
 
 
 
 
 
 
 
 
 
46dd39f
 
750e3a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d26076
 
 
 
 
750e3a9
 
 
 
 
 
 
9d26076
 
 
 
750e3a9
9d26076
 
 
750e3a9
 
 
9d26076
750e3a9
 
9d26076
750e3a9
 
9d26076
750e3a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d26076
 
750e3a9
 
 
 
 
 
441bf3a
 
 
 
 
 
 
 
 
750e3a9
 
 
 
 
 
 
 
9d26076
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
license: mit
tags:
  - astronomy
  - time-series
  - light-curves
  - onnx
library_name: onnx
---

# Astromer 2

**HuggingFace:** [light-curve/astromer2](https://huggingface.co/light-curve/astromer2)

## Paper

Donoso-Oliva, C., Becker, I., Protopapas, P., Cabrera-Vives, G., CΓ‘diz-Leyton, M., & Moreno-Cartagena, D. (2026). *Generalizing across astronomical surveys: Few-shot light curve classification with Astromer 2*. Astronomy & Astrophysics (in press).

```bibtex
@article{astromer2,
  author  = {Donoso-Oliva, C. and Becker, I. and Protopapas, P. and
             Cabrera-Vives, G. and C{\'a}diz-Leyton, M. and Moreno-Cartagena, D.},
  title   = {Generalizing across astronomical surveys: Few-shot light curve
             classification with {Astromer} 2},
  journal = {Astronomy \& Astrophysics},
  year    = {2026},
  note    = {In press},
}
```

## Original code

<https://github.com/astromer-science/main-code> (git submodule at `models/astromer2/code/`)

## License

MIT β€” see [LICENSE](LICENSE).

## Model overview

Astromer 2 is a BERT-inspired transformer encoder pretrained on 1.5 million MACHO light curves via masked magnitude prediction. The encoder processes irregularly-sampled photometric time series (time, magnitude) using MJD-aware positional encoding and a trainable mask token. It produces per-timestep contextual embeddings that can be aggregated into a fixed-size representation for downstream tasks such as few-shot classification.

Default configuration: 6 attention blocks, 4 heads, head dimension 64 (d_model = 256), sequence length 200, embedding dimension 256.

## Input data format

Raw light curves are pairs `(time, mag)`:
- `time` β€” observation time in days. Need not be absolute MJD; any consistent time axis in days works because the pipeline subtracts the per-window mean before the encoder sees it. The pretrained weights were produced from MACHO data with MJD ~48800–51700.
- `mag` β€” magnitude. MACHO instrumental magnitudes are typically negative (e.g. βˆ’10 to βˆ’3); the pipeline is not restricted to that range.

Photometric errors are **not used** at inference. The upstream preprocessing code expects a 3-column `[time, mag, err]` array internally, but errors only appear in the pretraining reconstruction-loss weights (`outputs['w_error']`), which are never passed to the encoder. Pass dummy zeros if you run the pipeline directly.

## Preprocessing steps

All steps are implemented in `code/src/data/loaders.py` (`get_loader`) and `code/src/data/preprocessing.py`.

### Step 1 β€” Windowing

The upstream code supports two windowing strategies via the `sampling` flag of `to_windows`:

- **`sampling=True` β€” random window** (used during pretraining): a single contiguous window of 200 observations is drawn at a uniformly random starting position. Light curves shorter than 200 observations are used in full.
- **`sampling=False` β€” sequential windows** (used for test-data generation): the light curve is divided into sequential, non-overlapping windows of 200 observations. A light curve of length *L* yields ⌊*L*/200βŒ‹ + 1 windows; the last window may be shorter than 200 and is padded in step 3. Light curves shorter than 200 observations produce a single window. When a light curve produces multiple windows, each window yields a separate embedding vector; to obtain a single per-light-curve embedding, average the per-window embeddings.

Test-data is generated with `sampling=False`.

Source: `src/data/preprocessing.py:to_windows`.

### Step 2 β€” Zero-mean normalization

Subtract the per-window column mean from each column:

```
x_norm = x - mean(x, axis=0)   # x has shape [n_obs, 3]; columns: time, mag, err
```

After this step `times` = time βˆ’ mean(time) and `input` = mag βˆ’ mean(mag) are centred around zero.

Source: `src/data/preprocessing.py:standardize`.

### Step 3 β€” Padding and mask construction

Right-pad the normalised sequence to exactly 200 time steps with zeros. Construct `mask_in`:

```
mask_in[i] = 0   for i < n_obs   (real observation β€” visible to encoder)
mask_in[i] = 1   for i >= n_obs  (padding β€” hidden from encoder)
```

> **Note on mask convention:** the internal pipeline uses `mask_in=0` for visible positions and `mask_in=1` for padding/hidden positions. This is the opposite of the ONNX interface (see below).

Source: `src/data/masking.py:mask_sample`, padding block at the end.

### Step 4 β€” Format encoder inputs

Extract the two encoder inputs from the normalised, padded array:

| Tensor | Source | Shape |
|--------|--------|-------|
| `input` | normalised magnitude column | `[batch, 200, 1]` |
| `times` | normalised time column | `[batch, 200, 1]` |
| `mask_in` | constructed in step 3 | `[batch, 200, 1]` |

The normalised error column is **not** fed to the encoder. Errors appear only in the pretraining reconstruction loss.

Source: `src/data/loaders.py:format_inp_astromer` (`aversion='base'`).

## Inputs (ONNX)

The exported ONNX models use a **user-friendly mask convention** that is the inverse of the internal pipeline:

| Tensor | Shape | Description |
|--------|-------|-------------|
| `input` | `[batch, 200, 1]` | `mag βˆ’ mean(mag)` over the window (step 2 above) |
| `times` | `[batch, 200, 1]` | `time βˆ’ mean(time)` over the window (step 2 above) |
| `mask_in` | `[batch, 200, 1]` | **1 = valid observation, 0 = padding** |

The ONNX wrapper inverts `mask_in` internally before passing it to the encoder, so consumers can use the intuitive convention.

## Outputs (ONNX)

Single file `astromer2.onnx` with three named outputs:

| Output name | Shape | Aggregation |
|-------------|-------|-------------|
| `mean` | `[batch, 256]` | Masked mean pooling: `sum(z * mask_in) / sum(mask_in)` |
| `max`  | `[batch, 256]` | Masked max pooling over valid timesteps |
| `sequence` | `[batch, 200, 256]` | Per-timestep features |

Request only the output(s) you need via `session.run(["mean"], feed)` β€” onnxruntime will prune unused computation.

ONNX opset: 13.

## Weights

Source: [Zenodo record 18207945](https://zenodo.org/records/18207945)
Training dataset: MACHO (1.5 million light curves, V and R bands)
Checkpoint: `astromer_v2/macho/`

The test-data parquet file was generated with these MACHO weights and `sampling=False` (sequential windows).