File size: 33,143 Bytes
e43dca4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
# AirTrackLM: LLM4STP Adapted for ADS-B Air Track Prediction

## Complete Architecture & Implementation Plan

---

## 1. Executive Summary

We adapt the LLM4STP multi-feature fusion architecture (originally for maritime AIS ship trajectory prediction) to work with **ADS-B air track data**. The model uses a **decoder-only transformer** with four specialized embedding types β€” Prompt, Uncertainty, Geohash, and Temporal β€” fused together for **next-state prediction** pretraining. Once pretrained, the model is adaptable to downstream tasks like activity classification.

This design is grounded in published results from:
- **FTP-LLM** (arXiv:2501.17459) β€” LLaMA-3.1-8B for flight trajectory prediction
- **H3-CLM** (arXiv:2405.09596) β€” H3 geohash + causal LM for maritime trajectories
- **GeoFormer** (arXiv:2311.05092) β€” GPT-style geospatial tokenization
- **TrAISFormer** (arXiv:2109.03958) β€” Discrete tokenization of AIS features

---

## 2. System Architecture Overview

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        RAW ADS-B INPUT                              β”‚
β”‚              (timestamp, latitude, longitude, altitude)             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   FEATURE DERIVATION PIPELINE                       β”‚
β”‚                                                                     β”‚
β”‚   Raw:     lat, lon, alt                                           β”‚
β”‚   Derived: COG, SOG, ROT, altitude_rate                            β”‚
β”‚   Meta:    timestamp β†’ (hour, day_of_week, month)                  β”‚
β”‚                                                                     β”‚
β”‚   Output per timestep:                                              β”‚
β”‚   state_t = [lat, lon, alt, COG, SOG, ROT, alt_rate]              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TOKENIZATION / ENCODING                          β”‚
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚   β”‚   Geohash     β”‚  β”‚  Continuous   β”‚  β”‚   Temporal   β”‚            β”‚
β”‚   β”‚  Tokenizer    β”‚  β”‚  Discretizer  β”‚  β”‚   Encoder    β”‚            β”‚
β”‚   β”‚              β”‚  β”‚              β”‚  β”‚              β”‚            β”‚
β”‚   β”‚ lat,lon,alt  β”‚  β”‚ COG,SOG,ROT  β”‚  β”‚ hour,dow,    β”‚            β”‚
β”‚   β”‚ β†’ H3 cell + β”‚  β”‚ alt_rate     β”‚  β”‚ month        β”‚            β”‚
β”‚   β”‚   alt_band   β”‚  β”‚ β†’ bin IDs    β”‚  β”‚ β†’ time IDs   β”‚            β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚          β”‚                 β”‚                 β”‚                      β”‚
β”‚          β–Ό                 β–Ό                 β–Ό                      β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚   β”‚  Geohash     β”‚  β”‚  Feature     β”‚  β”‚  Temporal    β”‚            β”‚
β”‚   β”‚  Embedding   β”‚  β”‚  Embeddings  β”‚  β”‚  Embedding   β”‚            β”‚
β”‚   β”‚  Table       β”‚  β”‚  Tables      β”‚  β”‚  Table       β”‚            β”‚
β”‚   β”‚  (d_model)   β”‚  β”‚  (d_model)   β”‚  β”‚  (d_model)   β”‚            β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚          β”‚                 β”‚                 β”‚                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚                 β”‚                 β”‚
           β–Ό                 β–Ό                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    EMBEDDING FUSION LAYER                            β”‚
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  Geohash   β”‚ β”‚  Feature   β”‚ β”‚  Temporal  β”‚ β”‚ Uncertainty  β”‚   β”‚
β”‚   β”‚  Embed     β”‚ β”‚  Embed     β”‚ β”‚  Embed     β”‚ β”‚   Embed      β”‚   β”‚
β”‚   β”‚  (d_model) β”‚ β”‚  (d_model) β”‚ β”‚  (d_model) β”‚ β”‚  (d_model)   β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚         β”‚              β”‚              β”‚               β”‚            β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚            β”‚
β”‚                    β”‚          β”‚                        β”‚            β”‚
β”‚                    β–Ό          β–Ό                        β–Ό            β”‚
β”‚              E_state = E_geo + E_feat + E_temp + E_uncert           β”‚
β”‚                              β”‚                                      β”‚
β”‚                              β–Ό                                      β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚   β”‚  Prompt Embedding (prepended prefix)      β”‚                    β”‚
β”‚   β”‚  [PROMPT_1, PROMPT_2, ..., PROMPT_k]      β”‚                    β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚                       β”‚                                             β”‚
β”‚                       β–Ό                                             β”‚
β”‚   Input: [PROMPT_TOKENS | STATE_1 | STATE_2 | ... | STATE_T]      β”‚
β”‚                       β”‚                                             β”‚
β”‚                       β–Ό                                             β”‚
β”‚              Linear Projection β†’ d_model                           β”‚
β”‚                       β”‚                                             β”‚
β”‚                       β–Ό                                             β”‚
β”‚              + Positional Encoding (sinusoidal)                    β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              DECODER-ONLY TRANSFORMER BACKBONE                      β”‚
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚   β”‚  Transformer Block Γ—N_layers                        β”‚          β”‚
β”‚   β”‚                                                     β”‚          β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚          β”‚
β”‚   β”‚  β”‚  Causal Multi-Head Self-Attention        β”‚       β”‚          β”‚
β”‚   β”‚  β”‚  (masked: each position attends only     β”‚       β”‚          β”‚
β”‚   β”‚  β”‚   to itself and earlier positions)        β”‚       β”‚          β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚          β”‚
β”‚   β”‚                     β”‚                               β”‚          β”‚
β”‚   β”‚                     β–Ό                               β”‚          β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚          β”‚
β”‚   β”‚  β”‚  LayerNorm + Residual Connection         β”‚       β”‚          β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚          β”‚
β”‚   β”‚                     β”‚                               β”‚          β”‚
β”‚   β”‚                     β–Ό                               β”‚          β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚          β”‚
β”‚   β”‚  β”‚  Feed-Forward Network                    β”‚       β”‚          β”‚
β”‚   β”‚  β”‚  (Linear β†’ GELU β†’ Linear)               β”‚       β”‚          β”‚
β”‚   β”‚  β”‚  d_model β†’ 4*d_model β†’ d_model          β”‚       β”‚          β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚          β”‚
β”‚   β”‚                     β”‚                               β”‚          β”‚
β”‚   β”‚                     β–Ό                               β”‚          β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚          β”‚
β”‚   β”‚  β”‚  LayerNorm + Residual Connection         β”‚       β”‚          β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚          β”‚
β”‚   β”‚                                                     β”‚          β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚          β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      OUTPUT HEADS                                   β”‚
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚   β”‚  PRETRAINING: Next-State Prediction Head                β”‚      β”‚
β”‚   β”‚                                                         β”‚      β”‚
β”‚   β”‚  For each position t, predict state at t+1:             β”‚      β”‚
β”‚   β”‚                                                         β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(geohash_token_{t+1})       β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(COG_bin_{t+1})             β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(SOG_bin_{t+1})             β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(ROT_bin_{t+1})             β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(alt_rate_bin_{t+1})        β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(alt_band_{t+1})            β”‚      β”‚
β”‚   β”‚                                                         β”‚      β”‚
β”‚   β”‚  Loss = Ξ£ CrossEntropy(predicted_feature, true_feature) β”‚      β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚   β”‚  DOWNSTREAM: Activity Classification Head               β”‚      β”‚
β”‚   β”‚  (attached after pretraining, frozen or fine-tuned)     β”‚      β”‚
β”‚   β”‚                                                         β”‚      β”‚
β”‚   β”‚  h_[BOS] or mean(h_1:T) β†’ MLP β†’ softmax β†’ class label  β”‚      β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

---

## 3. The Four Embedding Types (Detailed)

### 3.1 Geohash Embeddings β€” Spatial Position Encoding

**Purpose**: Encode the aircraft's 3D geographic position as a discrete token.

**Method**: We use **H3 hexagonal hierarchical spatial index** (Uber's H3) at resolution 5 (hex area β‰ˆ 252 kmΒ², edge β‰ˆ 9.85 km) for en-route flight, with an option to use resolution 7 (β‰ˆ 5.16 kmΒ², edge β‰ˆ 1.22 km) for terminal areas. This follows the H3-CLM paper's approach but adapted for aviation's larger spatial scale.

**3D Extension**: Since aircraft operate in 3D, we combine the H3 cell with an **altitude band**:
```
Geohash Token = H3_cell_index Γ— N_alt_bands + alt_band_index

Altitude bands (1000 ft increments):
  Band 0:     0 - 1,000 ft    (ground / taxi)
  Band 1:  1,000 - 2,000 ft   (initial climb / approach)
  ...
  Band 45: 44,000 - 45,000 ft (high cruise)

  N_alt_bands = 46
```

**Vocabulary size**: At H3 resolution 5, the number of unique cells covering typical airspace is ~100K-200K. With altitude bands: `~200K Γ— 46 β‰ˆ 9.2M` β€” too large for direct embedding.

**Solution β€” Factored Embedding**:
```
E_geohash = E_h3[h3_cell_id] + E_alt[alt_band_id]

E_h3:  learned embedding table, vocab = N_h3_cells (~200K or hashing trick to 50K)
E_alt: learned embedding table, vocab = 46

Both project to d_model dimensions.
```

The **hashing trick**: Map H3 cell indices through a hash function to a fixed vocabulary of ~50,000 buckets. This bounds memory while maintaining spatial discrimination.

**Why H3 over traditional geohash**: H3 hexagons have uniform area (no polar distortion), hierarchical nesting, and consistent neighbor relationships β€” critical for trajectory continuity.

### 3.2 Temporal Embeddings β€” When Is the Aircraft Flying?

**Purpose**: Encode temporal context β€” time of day affects traffic density, routes, and behavior.

**Method**: Additive composition of multiple temporal scales:
```
E_temporal = E_hour[hour_of_day] + E_dow[day_of_week] + E_month[month]

E_hour:  24 entries  (captures rush hour vs. night patterns)
E_dow:    7 entries  (weekday vs. weekend traffic)
E_month: 12 entries  (seasonal routes, weather patterns)

All project to d_model dimensions.
```

**Optional β€” Sinusoidal Sub-minute Encoding**: For sub-minute resolution:
```
E_minute = sin(2Ο€ Γ— minute / 60), cos(2Ο€ Γ— minute / 60)  β†’ linear β†’ d_model
```

### 3.3 Uncertainty Embeddings β€” How Confident Are We?

**Purpose**: Encode the model's uncertainty about the current trajectory state. Aircraft in straight-and-level cruise have low uncertainty; aircraft maneuvering near airports have high uncertainty.

**Method**: Compute a **trajectory smoothness score** from recent states, then discretize:

```
Uncertainty sources (sliding window of k=5 recent states):

1. Position variance:  σ²_pos = var(Ξ”lat) + var(Ξ”lon)
2. Heading variance:   σ²_COG = circular_var(COG_{t-k:t})
3. Speed variance:     σ²_SOG = var(SOG_{t-k:t})
4. Altitude variance:  σ²_alt = var(alt_rate_{t-k:t})

Combined uncertainty score:
  U_t = w1·σ²_pos + w2·σ²_COG + w3·σ²_SOG + w4·σ²_alt

Discretize into N_uncert = 16 bins (quantile binning on training data)

E_uncertainty = E_uncert_table[bin(U_t)]   β†’  d_model
```

**Weights w1-w4**: Hyperparameters tuned on validation data, or learned as part of the model.

**During inference**: For multi-step prediction, uncertainty can be updated using MC-Dropout or ensemble disagreement.

### 3.4 Prompt Embeddings β€” Task and Context Metadata

**Purpose**: Provide metadata context about the flight, analogous to system prompts in LLMs. Enables task conditioning and multi-task learning.

**Method**: Learnable prompt tokens prepended to the trajectory:

```
Prompt token vocabulary:
  - Aircraft category:  [HEAVY, LARGE, SMALL, ROTORCRAFT, GLIDER, UAV, UNKNOWN]  (7)
  - Flight phase:       [CLIMB, CRUISE, DESCENT, APPROACH, GROUND, UNKNOWN]       (6)
  - Region:             [CONUS, EUROPE, ASIA, OTHER]                               (4)
  - Task:               [PREDICT, CLASSIFY, DETECT_ANOMALY]                        (3)
  - Special:            [BOS, EOS, PAD, MASK]                                      (4)

Total prompt vocab: ~24 tokens

Prompt sequence (prepended):
  [BOS, TASK_TOKEN, AIRCRAFT_TOKEN, PHASE_TOKEN, REGION_TOKEN]

Each has a learned embedding of dimension d_model.
```

**For downstream classification**: Change TASK_TOKEN to CLASSIFY; output at BOS position is used for classification.

---

## 4. Feature Derivation Pipeline

### 4.1 Raw Input
```
timestamp (Unix epoch seconds)
latitude  (degrees, WGS84)
longitude (degrees, WGS84)
altitude  (feet, barometric or geometric)
```

### 4.2 Derived Features

```python
import numpy as np

def derive_features(timestamps, lats, lons, alts):
    """
    Derive COG, SOG, ROT, and altitude rate from raw position data.
    All inputs: numpy arrays of shape (N,) for a single trajectory.
    Returns arrays of shape (N,) β€” first element is NaN.
    """
    dt = np.diff(timestamps)  # seconds
    dt = np.maximum(dt, 1e-6)  # avoid division by zero
    
    # --- Course Over Ground (COG) ---
    lat1, lat2 = np.radians(lats[:-1]), np.radians(lats[1:])
    dlon = np.radians(np.diff(lons))
    
    x = np.sin(dlon) * np.cos(lat2)
    y = np.cos(lat1) * np.sin(lat2) - np.sin(lat1) * np.cos(lat2) * np.cos(dlon)
    COG = np.degrees(np.arctan2(x, y)) % 360  # [0, 360)
    
    # --- Speed Over Ground (SOG) ---
    dlat = np.radians(np.diff(lats))
    a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
    distance_nm = 3440.065 * c  # Earth radius in nautical miles
    SOG = distance_nm / (dt / 3600)  # knots
    
    # --- Rate of Turn (ROT) ---
    dCOG = np.diff(COG)
    dCOG = (dCOG + 180) % 360 - 180  # normalize to [-180, 180]
    ROT = np.full(len(lats), np.nan)
    ROT[2:] = dCOG / dt[1:]  # degrees per second
    
    # --- Rate of Altitude Change ---
    dalt = np.diff(alts)  # feet
    alt_rate = dalt / (dt / 60)  # feet per minute
    
    # Pad first elements
    COG_full = np.concatenate([[np.nan], COG])
    SOG_full = np.concatenate([[np.nan], SOG])
    alt_rate_full = np.concatenate([[np.nan], alt_rate])
    
    return COG_full, SOG_full, ROT, alt_rate_full
```

### 4.3 Feature Discretization

| Feature       | Range             | Bin Width    | N_bins | Notes              |
|---------------|-------------------|--------------|--------|--------------------|
| COG           | [0, 360)          | 5Β°           | 72     | Circular           |
| SOG           | [0, 600] kts      | 5 knots      | 121    | Capped at ~Mach 1  |
| ROT           | [-6, 6] Β°/s       | 0.25 Β°/s     | 49     | Capped Β±6Β°/s       |
| Altitude Rate | [-6000, 6000] fpm | 200 ft/min   | 61     | Capped Β±6000 fpm   |

Outliers beyond caps clipped to boundary bin.

### 4.4 Trajectory Preprocessing Pipeline

```
1. Segment raw ADS-B by ICAO24 + temporal gaps > 15 min β†’ individual flights
2. Resample to fixed Ξ”t = 60 seconds (linear interp for position, circular for heading)
3. Derive features (COG, SOG, ROT, alt_rate)
4. Drop first 2 points per trajectory (NaN from derivation)
5. Filter: remove trajectories with < 20 points (< 20 minutes)
6. Compute H3 cell (res 5) + altitude band for each point
7. Discretize all continuous features into bins
8. Compute uncertainty scores (sliding window k=5)
9. Extract temporal features (hour, dow, month)
10. Construct prompt tokens from metadata (if available)
```

---

## 5. Model Hyperparameters

### 5.1 Model Dimensions

| Parameter        | Value  | Rationale                                          |
|------------------|--------|----------------------------------------------------|
| d_model          | 256    | H3-CLM found 256-1024 effective                    |
| n_heads          | 8      | head_dim = 32                                       |
| n_layers         | 8      | Moderate depth for ~10M param model                 |
| d_ff             | 1024   | 4Γ— d_model (standard)                              |
| max_seq_len      | 128    | 128 states Γ— 60s β‰ˆ 2 hours of flight               |
| n_prompt_tokens  | 5      | [BOS, TASK, AIRCRAFT, PHASE, REGION]                |
| dropout          | 0.1    |                                                     |

**Total parameters**: ~8-12M (trainable on single GPU in hours)

### 5.2 Vocabulary Sizes

| Embedding        | Vocab  | Dim |
|------------------|--------|-----|
| H3 cells         | 50,000 | 256 |
| Altitude bands   | 46     | 256 |
| COG bins         | 72     | 256 |
| SOG bins         | 121    | 256 |
| ROT bins         | 49     | 256 |
| Alt rate bins    | 61     | 256 |
| Hour of day      | 24     | 256 |
| Day of week      | 7      | 256 |
| Month            | 12     | 256 |
| Uncertainty bins | 16     | 256 |
| Prompt tokens    | 24     | 256 |

### 5.3 State Token Composition

Each timestep β†’ single state token via additive fusion:

```
E_state_t = E_h3[h3_id_t] + E_alt_band[alt_band_t]            # Geohash (3D position)
          + E_COG[cog_bin_t] + E_SOG[sog_bin_t]                # Kinematics
          + E_ROT[rot_bin_t] + E_alt_rate[alt_rate_bin_t]       # Dynamics
          + E_hour[hour_t] + E_dow[dow_t] + E_month[month_t]   # Temporal
          + E_uncert[uncert_bin_t]                               # Uncertainty

E_state_t ∈ R^{d_model}
```

This additive fusion follows BERT (token + segment + position) and TrAISFormer.

---

## 6. Training Recipe

### 6.1 Pretraining: Next-State Prediction (Causal LM)

**Objective**: Given states 1..T, predict state at T+1 (applied autoregressively at every position).

**Loss**:
```
L = Ξ£_{t=1}^{T-1} [ Ξ»_geo Β· CE(Ε·_geo_t, y_geo_{t+1})
                    + Ξ»_COG Β· CE(Ε·_COG_t, y_COG_{t+1})
                    + Ξ»_SOG Β· CE(Ε·_SOG_t, y_SOG_{t+1})
                    + Ξ»_ROT Β· CE(Ε·_ROT_t, y_ROT_{t+1})
                    + Ξ»_alt Β· CE(Ε·_alt_rate_t, y_alt_rate_{t+1})
                    + Ξ»_altb Β· CE(Ε·_alt_band_t, y_alt_band_{t+1}) ]

Ξ» values default to 1.0 (equal weighting).
```

**Training hyperparameters** (based on FTP-LLM + H3-CLM):

| Parameter            | Value               |
|----------------------|---------------------|
| Optimizer            | AdamW               |
| Learning rate        | 5e-4                |
| LR Schedule          | Cosine + 5% warmup  |
| Batch size (per GPU) | 64                  |
| Gradient accumulation| 4 (effective = 256) |
| Max epochs           | 30 (early stop p=5) |
| Weight decay         | 0.01                |
| Gradient clipping    | 1.0                 |
| Mixed precision      | bf16                |

**Data windowing**: Sliding window size=128, stride=64 (50% overlap).

### 6.2 Downstream: Activity Classification

After pretraining, attach classification head:
```
h_BOS β†’ Linear(256, 128) β†’ GELU β†’ Dropout(0.1) β†’ Linear(128, N_classes)
```

**Fine-tuning options**:
- **A**: Freeze backbone, train head only (fast, small data)
- **B**: Full fine-tune, backbone lr=1e-5, head lr=1e-3

---

## 7. Dataset Strategy

### 7.1 Prototyping β€” `traffic` Python Library

```python
from traffic.data.samples import landing_zurich_2019
# ~2,000 flights near Zurich
# Columns: timestamp, icao24, callsign, latitude, longitude, altitude,
#          groundspeed, track, vertical_rate, ...
```

Instant access, clean, well-documented. Single airport, limited diversity.

### 7.2 Training β€” OpenSky Network

```python
from pyopensky.trino import Trino
trino = Trino()
df = trino.rawquery("""
    SELECT time, icao24, lat, lon, baroaltitude, velocity, heading, vertrate
    FROM state_vectors_data4
    WHERE hour >= '2024-01-15 00:00:00'
      AND hour <  '2024-01-15 12:00:00'
      AND lat BETWEEN 40 AND 55
      AND lon BETWEEN -10 AND 20
    ORDER BY icao24, time
""")
```

**Target**:
- **Region A** (train): Europe, 1 month β†’ ~500K-1M flights
- **Region B** (OOD test): US CONUS, 1 week β†’ ~200K flights
- **Region C** (far test): East Asia, 1 week β†’ ~100K flights

### 7.3 Alternative: SCAT Dataset

~170K en-route flights over Sweden, Zenodo. Pre-segmented, clean.

### 7.4 Data Split

```
Training:    70% of Region A flights
Validation:  15% of Region A flights  
Test (IID):  15% of Region A flights
Test (OOD):  100% of Region B flights
Test (Far):  100% of Region C flights
```

Split by **flight** (not time window) to avoid data leakage.

---

## 8. Ablation Study: Geohash Geographic Dependency

### 8.1 Hypothesis

> Geohash embeddings encode **absolute geographic position**, causing the model to memorize region-specific patterns (airways, approach paths, airspace structure). This improves in-distribution performance but degrades transfer to unseen regions.

### 8.2 Experimental Variants

| Variant | Geohash Type | Description |
|---------|-------------|-------------|
| **V1: Full Model** | H3 absolute | Complete architecture as described |
| **V2: No Geohash** | None | Remove geohash entirely; model sees only kinematics + temporal + uncertainty |
| **V3: Relative Geohash** | H3 relative | H3 cell of (Ξ”lat, Ξ”lon) from trajectory start β€” position-invariant |
| **V4: Multi-Resolution** | H3 res 3+5+7 | 3 resolutions summed (coarse→fine) |
| **V5: Continuous Position** | Linear projection | `Linear([lat, lon, alt] β†’ d_model)` β€” no discretization |

### 8.3 Evaluation Metrics

For each variant Γ— each test set (IID, OOD, Far):

| Metric | Description |
|--------|-------------|
| Geo Accuracy | % correct H3 cell prediction |
| Position MAE | Mean absolute error in km |
| COG MAE | Heading error in degrees |
| SOG MAE | Speed error in knots |
| Multi-step ADE | Average displacement error over 5 predicted steps |
| Multi-step FDE | Final displacement error at step 5 |

### 8.4 Key Comparisons

| Comparison | Tests |
|-----------|-------|
| V1 vs V2 (IID) | How much geohash helps when test = train region |
| V1 vs V2 (OOD) | If V2 > V1 on OOD β†’ geohash causes geographic overfitting |
| V1 vs V3 (OOD) | If V3 good on both IID and OOD β†’ relative geohash is the sweet spot |
| V4 (all) | Multi-resolution: coarse cells transfer, fine cells specialize? |
| V5 (all) | Does continuous encoding avoid discretization issues? |

### 8.5 Expected Outcomes

- **V1**: Best IID, worst OOD (hypothesis)
- **V3**: Best compromise β€” predicted winner
- **V5**: May struggle (loses discrete token structure transformers excel at)
- **V2**: Strong OOD baseline, sacrifices IID

### 8.6 Additional Analysis

- **Attention visualization**: V1 vs V3 attention patterns
- **Embedding clustering**: t-SNE of geohash embeddings colored by region
- **Learning curves**: IID vs OOD performance vs training data size

---

## 9. Implementation Phases

### Phase 1: Data Pipeline (Week 1)
- Set up `traffic` library, extract sample trajectories
- Implement feature derivation (COG, SOG, ROT, alt_rate)
- Implement H3 geohash encoding + altitude banding
- Implement feature discretization (binning)
- Implement uncertainty score computation
- Build PyTorch Dataset class with sliding window
- Unit tests for all derivation functions

### Phase 2: Model Architecture (Week 1-2)
- Implement all embedding tables
- Implement additive fusion layer
- Implement prompt token prepending
- Implement decoder-only transformer backbone
- Implement multi-head output (6 prediction heads)
- Implement classification head (for downstream)
- Forward pass test with dummy data

### Phase 3: Pretraining (Week 2-3)
- Implement training loop with multi-task loss
- Prototyping run on `traffic` data (small, fast iteration)
- Scale to OpenSky data
- Monitor loss curves, validate convergence
- Save best checkpoint

### Phase 4: Downstream Adaptation (Week 3-4)
- Implement classification fine-tuning pipeline
- Test on activity classification task
- Compare frozen vs. fine-tuned backbone

### Phase 5: Ablation Study (Week 4-5)
- Implement all 5 geohash variants
- Train each variant with identical hyperparameters
- Evaluate on IID, OOD, and Far test sets
- Generate comparison tables and visualizations
- Write analysis of geographic dependency findings

---

## 10. Key Design Decisions & Rationale

| Decision | Choice | Why |
|----------|--------|-----|
| Custom model vs. pretrained LLM | Custom ~10M param transformer | FTP-LLM showed text-tokenized LLMs work, but custom allows proper multi-feature fusion. 10M params trains in hours. |
| H3 vs. traditional geohash | H3 | Uniform hexagonal cells, no polar distortion, hierarchical. Proven by H3-CLM. |
| Additive vs. concatenative fusion | Additive | BERT/TrAISFormer paradigm. Keeps d_model constant. Concatenation β†’ d_model Γ— N_features = massive. |
| 60s time resolution | 60 seconds | FTP-LLM validated 1-min aggregation. 128 steps β‰ˆ 2+ hours. |
| Factored geohash (H3 + alt) | Separate tables, summed | Avoids combinatorial explosion (9.2M β†’ 50K + 46). |
| Multi-head output | Separate softmax per feature | More interpretable, allows per-feature analysis. |
| Uncertainty from smoothness | Variance-based | Computable at data time, no inference overhead. |

---

## 11. Risk Analysis

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Geohash overfits to region | High | High | Ablation study; V3 (relative) is fallback |
| OpenSky access issues | Medium | High | Fallback: `traffic` samples + SCAT |
| 60s too coarse for terminal | Medium | Low | Separate terminal model at 10s |
| Model too small | Low | Medium | Scale: d_model→512, n_layers→16 (~40M) |
| Alt discretization too coarse | Low | Low | Refine to 500ft bands (92) |

---

## 12. Monitoring & Evaluation

**During training** (Trackio):
- Total loss + per-feature loss curves
- Validation loss each epoch
- LR schedule, GPU utilization

**After training**:
- Next-state accuracy (top-1, top-5 per feature)
- Position error in km
- Multi-step prediction (1, 5, 10, 20 steps ahead)
- Downstream classification F1/precision/recall

---

*Grounded in: FTP-LLM, H3-CLM, GeoFormer, TrAISFormer, and LLM4STP (reconstructed). Ready for implementation upon approval.*