hombit commited on
Commit
c8d9fb9
·
verified ·
1 Parent(s): b718902

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +82 -0
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - astronomy
5
+ - time-series
6
+ - light-curves
7
+ - onnx
8
+ library_name: onnx
9
+ ---
10
+
11
+ # Astromer 1
12
+
13
+ ## Paper
14
+
15
+ Donoso-Oliva, C., Becker, I., Protopapas, P., Cabrera-Vives, G., Forster, F., & Estévez, P. A. (2023). *ASTROMER: A transformer-based embedding for the representation of light curves*. Astronomy & Astrophysics, 670, A54.
16
+
17
+ ```bibtex
18
+ @article{astromer1,
19
+ author = {Donoso-Oliva, C. and Becker, I. and Protopapas, P. and
20
+ Cabrera-Vives, G. and Forster, F. and Est{\'e}vez, P. A.},
21
+ title = {{ASTROMER}: A transformer-based embedding for the representation
22
+ of light curves},
23
+ journal = {Astronomy \& Astrophysics},
24
+ volume = {670},
25
+ pages = {A54},
26
+ year = {2023},
27
+ doi = {10.1051/0004-6361/202243928},
28
+ }
29
+ ```
30
+
31
+ ## Original code
32
+
33
+ <https://github.com/astromer-science/main-code> (Astromer v1 tag)
34
+
35
+ ## License
36
+
37
+ MIT — see [LICENSE](LICENSE).
38
+
39
+ ## Model overview
40
+
41
+ Astromer 1 is a transformer encoder pretrained on MACHO R-band light curves via
42
+ masked magnitude prediction. It maps irregularly-sampled photometric time series
43
+ to per-timestep contextual embeddings using an MJD-aware sinusoidal positional
44
+ encoding. The architecture uses 2 transformer layers, 4 attention heads, and a
45
+ head dimension of 64, producing 256-dimensional embeddings.
46
+
47
+ ## Inputs
48
+
49
+ All tensors are `float32`. Magnitudes must be **zero-mean normalized** before
50
+ passing to the model (subtract the per-light-curve mean magnitude).
51
+
52
+ | Tensor | Shape | Description |
53
+ |--------|-------|-------------|
54
+ | `input` | `[batch, 200, 1]` | Zero-mean normalized magnitudes |
55
+ | `times` | `[batch, 200, 1]` | Observation times in MJD |
56
+ | `mask_in` | `[batch, 200, 1]` | 1 = valid observation, 0 = padded position |
57
+
58
+ ## Outputs (ONNX)
59
+
60
+ | File | Output shape | Aggregation |
61
+ |------|-------------|-------------|
62
+ | `astromer1_mean.onnx` | `[batch, 256]` | Masked mean pooling over valid timesteps |
63
+ | `astromer1_max.onnx` | `[batch, 256]` | Masked max pooling over valid timesteps |
64
+ | `astromer1_full.onnx` | `[batch, 200, 256]` | Full per-timestep sequence |
65
+
66
+ ONNX opset: 13.
67
+
68
+ ## Preprocessing steps
69
+
70
+ 1. **Collect** MJD observation times and magnitudes for each light curve.
71
+ 2. **Zero-mean normalize** magnitudes: subtract the mean magnitude of each light curve individually (`mag -= mag.mean()`).
72
+ 3. **Truncate** each light curve to at most 200 observations (take the first 200 if longer).
73
+ 4. **Pad** shorter light curves to exactly 200 positions: append zeros to both `input` and `times`.
74
+ 5. **Build the mask**: set `mask_in = 1` for real observations, `mask_in = 0` for padded positions.
75
+ 6. **Reshape** each tensor to `[batch, 200, 1]` (add trailing dimension).
76
+
77
+ The sequence length is fixed at 200 by the pretrained weights.
78
+
79
+ ## Weights
80
+
81
+ Source: [Zenodo record 18207945](https://zenodo.org/records/18207945)
82
+ Training dataset: MACHO R-band light curves