hombit commited on
Commit
464d155
·
verified ·
1 Parent(s): 1ae415d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -46,13 +46,13 @@ head dimension of 64, producing 256-dimensional embeddings.
46
 
47
  ## Inputs
48
 
49
- All tensors are `float32`. Magnitudes must be **zero-mean normalized** before
50
- passing to the model (subtract the per-light-curve mean magnitude).
51
 
52
  | Tensor | Shape | Description |
53
  |--------|-------|-------------|
54
- | `input` | `[batch, 200, 1]` | Zero-mean normalized magnitudes |
55
- | `times` | `[batch, 200, 1]` | Observation times in MJD |
56
  | `mask_in` | `[batch, 200, 1]` | 1 = valid observation, 0 = padded position |
57
 
58
  ## Outputs (ONNX)
@@ -67,9 +67,15 @@ ONNX opset: 13.
67
 
68
  ## Preprocessing steps
69
 
70
- 1. **Collect** MJD observation times and magnitudes for each light curve.
71
- 2. **Zero-mean normalize** magnitudes: subtract the mean magnitude of each light curve individually (`mag -= mag.mean()`).
72
- 3. **Truncate** each light curve to at most 200 observations (take the first 200 if longer).
 
 
 
 
 
 
73
  4. **Pad** shorter light curves to exactly 200 positions: append zeros to both `input` and `times`.
74
  5. **Build the mask**: set `mask_in = 1` for real observations, `mask_in = 0` for padded positions.
75
  6. **Reshape** each tensor to `[batch, 200, 1]` (add trailing dimension).
@@ -80,3 +86,6 @@ The sequence length is fixed at 200 by the pretrained weights.
80
 
81
  Source: [Zenodo record 18207945](https://zenodo.org/records/18207945)
82
  Training dataset: MACHO R-band light curves
 
 
 
 
46
 
47
  ## Inputs
48
 
49
+ All tensors are `float32`. Both magnitudes and times are **zero-mean normalized** before
50
+ passing to the model (subtract the per-window mean of each).
51
 
52
  | Tensor | Shape | Description |
53
  |--------|-------|-------------|
54
+ | `input` | `[batch, 200, 1]` | `mag − mean(mag)` over the window |
55
+ | `times` | `[batch, 200, 1]` | `time mean(time)` over the window |
56
  | `mask_in` | `[batch, 200, 1]` | 1 = valid observation, 0 = padded position |
57
 
58
  ## Outputs (ONNX)
 
67
 
68
  ## Preprocessing steps
69
 
70
+ Photometric errors are **not used** at inference only time and magnitude are needed.
71
+ The upstream code internally expects a 3-column `[time, mag, err]` array, but the error
72
+ column is dead code in the encoder (extracted but never used). Pass dummy zeros if
73
+ running the pipeline directly.
74
+
75
+ 1. **Collect** observation times (in days — need not be absolute MJD) and magnitudes.
76
+ 2. **Truncate** each light curve to at most 200 observations (take the first 200 if longer).
77
+ 3. **Zero-mean normalize** both columns over the window:
78
+ `time -= time.mean()`, `mag -= mag.mean()`
79
  4. **Pad** shorter light curves to exactly 200 positions: append zeros to both `input` and `times`.
80
  5. **Build the mask**: set `mask_in = 1` for real observations, `mask_in = 0` for padded positions.
81
  6. **Reshape** each tensor to `[batch, 200, 1]` (add trailing dimension).
 
86
 
87
  Source: [Zenodo record 18207945](https://zenodo.org/records/18207945)
88
  Training dataset: MACHO R-band light curves
89
+ Checkpoint: `pt_macho_v1_2021.zip`
90
+
91
+ The test-data parquet file was generated with these MACHO weights using truncation to the first 200 observations.