Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -46,13 +46,13 @@ head dimension of 64, producing 256-dimensional embeddings.
|
|
| 46 |
|
| 47 |
## Inputs
|
| 48 |
|
| 49 |
-
All tensors are `float32`.
|
| 50 |
-
passing to the model (subtract the per-
|
| 51 |
|
| 52 |
| Tensor | Shape | Description |
|
| 53 |
|--------|-------|-------------|
|
| 54 |
-
| `input` | `[batch, 200, 1]` |
|
| 55 |
-
| `times` | `[batch, 200, 1]` |
|
| 56 |
| `mask_in` | `[batch, 200, 1]` | 1 = valid observation, 0 = padded position |
|
| 57 |
|
| 58 |
## Outputs (ONNX)
|
|
@@ -67,9 +67,15 @@ ONNX opset: 13.
|
|
| 67 |
|
| 68 |
## Preprocessing steps
|
| 69 |
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
4. **Pad** shorter light curves to exactly 200 positions: append zeros to both `input` and `times`.
|
| 74 |
5. **Build the mask**: set `mask_in = 1` for real observations, `mask_in = 0` for padded positions.
|
| 75 |
6. **Reshape** each tensor to `[batch, 200, 1]` (add trailing dimension).
|
|
@@ -80,3 +86,6 @@ The sequence length is fixed at 200 by the pretrained weights.
|
|
| 80 |
|
| 81 |
Source: [Zenodo record 18207945](https://zenodo.org/records/18207945)
|
| 82 |
Training dataset: MACHO R-band light curves
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
## Inputs
|
| 48 |
|
| 49 |
+
All tensors are `float32`. Both magnitudes and times are **zero-mean normalized** before
|
| 50 |
+
passing to the model (subtract the per-window mean of each).
|
| 51 |
|
| 52 |
| Tensor | Shape | Description |
|
| 53 |
|--------|-------|-------------|
|
| 54 |
+
| `input` | `[batch, 200, 1]` | `mag − mean(mag)` over the window |
|
| 55 |
+
| `times` | `[batch, 200, 1]` | `time − mean(time)` over the window |
|
| 56 |
| `mask_in` | `[batch, 200, 1]` | 1 = valid observation, 0 = padded position |
|
| 57 |
|
| 58 |
## Outputs (ONNX)
|
|
|
|
| 67 |
|
| 68 |
## Preprocessing steps
|
| 69 |
|
| 70 |
+
Photometric errors are **not used** at inference — only time and magnitude are needed.
|
| 71 |
+
The upstream code internally expects a 3-column `[time, mag, err]` array, but the error
|
| 72 |
+
column is dead code in the encoder (extracted but never used). Pass dummy zeros if
|
| 73 |
+
running the pipeline directly.
|
| 74 |
+
|
| 75 |
+
1. **Collect** observation times (in days — need not be absolute MJD) and magnitudes.
|
| 76 |
+
2. **Truncate** each light curve to at most 200 observations (take the first 200 if longer).
|
| 77 |
+
3. **Zero-mean normalize** both columns over the window:
|
| 78 |
+
`time -= time.mean()`, `mag -= mag.mean()`
|
| 79 |
4. **Pad** shorter light curves to exactly 200 positions: append zeros to both `input` and `times`.
|
| 80 |
5. **Build the mask**: set `mask_in = 1` for real observations, `mask_in = 0` for padded positions.
|
| 81 |
6. **Reshape** each tensor to `[batch, 200, 1]` (add trailing dimension).
|
|
|
|
| 86 |
|
| 87 |
Source: [Zenodo record 18207945](https://zenodo.org/records/18207945)
|
| 88 |
Training dataset: MACHO R-band light curves
|
| 89 |
+
Checkpoint: `pt_macho_v1_2021.zip`
|
| 90 |
+
|
| 91 |
+
The test-data parquet file was generated with these MACHO weights using truncation to the first 200 observations.
|