light-curve
/

astromer1

@@ -46,13 +46,13 @@ head dimension of 64, producing 256-dimensional embeddings.
 ## Inputs
-All tensors are `float32`. Magnitudes must be **zero-mean normalized** before
-passing to the model (subtract the per-light-curve mean magnitude).
 | Tensor | Shape | Description |
 |--------|-------|-------------|
-| `input` | `[batch, 200, 1]` | Zero-mean normalized magnitudes |
-| `times` | `[batch, 200, 1]` | Observation times in MJD |
 | `mask_in` | `[batch, 200, 1]` | 1 = valid observation, 0 = padded position |
 ## Outputs (ONNX)
@@ -67,9 +67,15 @@ ONNX opset: 13.
 ## Preprocessing steps
-1. **Collect** MJD observation times and magnitudes for each light curve.
-2. **Zero-mean normalize** magnitudes: subtract the mean magnitude of each light curve individually (`mag -= mag.mean()`).
-3. **Truncate** each light curve to at most 200 observations (take the first 200 if longer).
 4. **Pad** shorter light curves to exactly 200 positions: append zeros to both `input` and `times`.
 5. **Build the mask**: set `mask_in = 1` for real observations, `mask_in = 0` for padded positions.
 6. **Reshape** each tensor to `[batch, 200, 1]` (add trailing dimension).
@@ -80,3 +86,6 @@ The sequence length is fixed at 200 by the pretrained weights.
 Source: [Zenodo record 18207945](https://zenodo.org/records/18207945)
 Training dataset: MACHO R-band light curves

 ## Inputs
+All tensors are `float32`. Both magnitudes and times are **zero-mean normalized** before
+passing to the model (subtract the per-window mean of each).
 | Tensor | Shape | Description |
 |--------|-------|-------------|
+| `input` | `[batch, 200, 1]` | `mag − mean(mag)` over the window |
+| `times` | `[batch, 200, 1]` | `time − mean(time)` over the window |
 | `mask_in` | `[batch, 200, 1]` | 1 = valid observation, 0 = padded position |
 ## Outputs (ONNX)
 ## Preprocessing steps
+Photometric errors are **not used** at inference — only time and magnitude are needed.
+The upstream code internally expects a 3-column `[time, mag, err]` array, but the error
+column is dead code in the encoder (extracted but never used). Pass dummy zeros if
+running the pipeline directly.
+1. **Collect** observation times (in days — need not be absolute MJD) and magnitudes.
+2. **Truncate** each light curve to at most 200 observations (take the first 200 if longer).
+3. **Zero-mean normalize** both columns over the window:
+   `time -= time.mean()`, `mag -= mag.mean()`
 4. **Pad** shorter light curves to exactly 200 positions: append zeros to both `input` and `times`.
 5. **Build the mask**: set `mask_in = 1` for real observations, `mask_in = 0` for padded positions.
 6. **Reshape** each tensor to `[batch, 200, 1]` (add trailing dimension).
 Source: [Zenodo record 18207945](https://zenodo.org/records/18207945)
 Training dataset: MACHO R-band light curves
+Checkpoint: `pt_macho_v1_2021.zip`
+The test-data parquet file was generated with these MACHO weights using truncation to the first 200 observations.