AirTrackLM / ARCHITECTURE.md
Jdice27's picture
Add ARCHITECTURE.md
e43dca4 verified

AirTrackLM: LLM4STP Adapted for ADS-B Air Track Prediction

Complete Architecture & Implementation Plan


1. Executive Summary

We adapt the LLM4STP multi-feature fusion architecture (originally for maritime AIS ship trajectory prediction) to work with ADS-B air track data. The model uses a decoder-only transformer with four specialized embedding types β€” Prompt, Uncertainty, Geohash, and Temporal β€” fused together for next-state prediction pretraining. Once pretrained, the model is adaptable to downstream tasks like activity classification.

This design is grounded in published results from:

  • FTP-LLM (arXiv:2501.17459) β€” LLaMA-3.1-8B for flight trajectory prediction
  • H3-CLM (arXiv:2405.09596) β€” H3 geohash + causal LM for maritime trajectories
  • GeoFormer (arXiv:2311.05092) β€” GPT-style geospatial tokenization
  • TrAISFormer (arXiv:2109.03958) β€” Discrete tokenization of AIS features

2. System Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        RAW ADS-B INPUT                              β”‚
β”‚              (timestamp, latitude, longitude, altitude)             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   FEATURE DERIVATION PIPELINE                       β”‚
β”‚                                                                     β”‚
β”‚   Raw:     lat, lon, alt                                           β”‚
β”‚   Derived: COG, SOG, ROT, altitude_rate                            β”‚
β”‚   Meta:    timestamp β†’ (hour, day_of_week, month)                  β”‚
β”‚                                                                     β”‚
β”‚   Output per timestep:                                              β”‚
β”‚   state_t = [lat, lon, alt, COG, SOG, ROT, alt_rate]              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TOKENIZATION / ENCODING                          β”‚
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚   β”‚   Geohash     β”‚  β”‚  Continuous   β”‚  β”‚   Temporal   β”‚            β”‚
β”‚   β”‚  Tokenizer    β”‚  β”‚  Discretizer  β”‚  β”‚   Encoder    β”‚            β”‚
β”‚   β”‚              β”‚  β”‚              β”‚  β”‚              β”‚            β”‚
β”‚   β”‚ lat,lon,alt  β”‚  β”‚ COG,SOG,ROT  β”‚  β”‚ hour,dow,    β”‚            β”‚
β”‚   β”‚ β†’ H3 cell + β”‚  β”‚ alt_rate     β”‚  β”‚ month        β”‚            β”‚
β”‚   β”‚   alt_band   β”‚  β”‚ β†’ bin IDs    β”‚  β”‚ β†’ time IDs   β”‚            β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚          β”‚                 β”‚                 β”‚                      β”‚
β”‚          β–Ό                 β–Ό                 β–Ό                      β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚   β”‚  Geohash     β”‚  β”‚  Feature     β”‚  β”‚  Temporal    β”‚            β”‚
β”‚   β”‚  Embedding   β”‚  β”‚  Embeddings  β”‚  β”‚  Embedding   β”‚            β”‚
β”‚   β”‚  Table       β”‚  β”‚  Tables      β”‚  β”‚  Table       β”‚            β”‚
β”‚   β”‚  (d_model)   β”‚  β”‚  (d_model)   β”‚  β”‚  (d_model)   β”‚            β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚          β”‚                 β”‚                 β”‚                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚                 β”‚                 β”‚
           β–Ό                 β–Ό                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    EMBEDDING FUSION LAYER                            β”‚
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚  Geohash   β”‚ β”‚  Feature   β”‚ β”‚  Temporal  β”‚ β”‚ Uncertainty  β”‚   β”‚
β”‚   β”‚  Embed     β”‚ β”‚  Embed     β”‚ β”‚  Embed     β”‚ β”‚   Embed      β”‚   β”‚
β”‚   β”‚  (d_model) β”‚ β”‚  (d_model) β”‚ β”‚  (d_model) β”‚ β”‚  (d_model)   β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚         β”‚              β”‚              β”‚               β”‚            β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚            β”‚
β”‚                    β”‚          β”‚                        β”‚            β”‚
β”‚                    β–Ό          β–Ό                        β–Ό            β”‚
β”‚              E_state = E_geo + E_feat + E_temp + E_uncert           β”‚
β”‚                              β”‚                                      β”‚
β”‚                              β–Ό                                      β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚   β”‚  Prompt Embedding (prepended prefix)      β”‚                    β”‚
β”‚   β”‚  [PROMPT_1, PROMPT_2, ..., PROMPT_k]      β”‚                    β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚                       β”‚                                             β”‚
β”‚                       β–Ό                                             β”‚
β”‚   Input: [PROMPT_TOKENS | STATE_1 | STATE_2 | ... | STATE_T]      β”‚
β”‚                       β”‚                                             β”‚
β”‚                       β–Ό                                             β”‚
β”‚              Linear Projection β†’ d_model                           β”‚
β”‚                       β”‚                                             β”‚
β”‚                       β–Ό                                             β”‚
β”‚              + Positional Encoding (sinusoidal)                    β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              DECODER-ONLY TRANSFORMER BACKBONE                      β”‚
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚   β”‚  Transformer Block Γ—N_layers                        β”‚          β”‚
β”‚   β”‚                                                     β”‚          β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚          β”‚
β”‚   β”‚  β”‚  Causal Multi-Head Self-Attention        β”‚       β”‚          β”‚
β”‚   β”‚  β”‚  (masked: each position attends only     β”‚       β”‚          β”‚
β”‚   β”‚  β”‚   to itself and earlier positions)        β”‚       β”‚          β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚          β”‚
β”‚   β”‚                     β”‚                               β”‚          β”‚
β”‚   β”‚                     β–Ό                               β”‚          β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚          β”‚
β”‚   β”‚  β”‚  LayerNorm + Residual Connection         β”‚       β”‚          β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚          β”‚
β”‚   β”‚                     β”‚                               β”‚          β”‚
β”‚   β”‚                     β–Ό                               β”‚          β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚          β”‚
β”‚   β”‚  β”‚  Feed-Forward Network                    β”‚       β”‚          β”‚
β”‚   β”‚  β”‚  (Linear β†’ GELU β†’ Linear)               β”‚       β”‚          β”‚
β”‚   β”‚  β”‚  d_model β†’ 4*d_model β†’ d_model          β”‚       β”‚          β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚          β”‚
β”‚   β”‚                     β”‚                               β”‚          β”‚
β”‚   β”‚                     β–Ό                               β”‚          β”‚
β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚          β”‚
β”‚   β”‚  β”‚  LayerNorm + Residual Connection         β”‚       β”‚          β”‚
β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚          β”‚
β”‚   β”‚                                                     β”‚          β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚          β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      OUTPUT HEADS                                   β”‚
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚   β”‚  PRETRAINING: Next-State Prediction Head                β”‚      β”‚
β”‚   β”‚                                                         β”‚      β”‚
β”‚   β”‚  For each position t, predict state at t+1:             β”‚      β”‚
β”‚   β”‚                                                         β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(geohash_token_{t+1})       β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(COG_bin_{t+1})             β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(SOG_bin_{t+1})             β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(ROT_bin_{t+1})             β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(alt_rate_bin_{t+1})        β”‚      β”‚
β”‚   β”‚  h_t β†’ Linear β†’ softmax β†’ P(alt_band_{t+1})            β”‚      β”‚
β”‚   β”‚                                                         β”‚      β”‚
β”‚   β”‚  Loss = Ξ£ CrossEntropy(predicted_feature, true_feature) β”‚      β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚   β”‚  DOWNSTREAM: Activity Classification Head               β”‚      β”‚
β”‚   β”‚  (attached after pretraining, frozen or fine-tuned)     β”‚      β”‚
β”‚   β”‚                                                         β”‚      β”‚
β”‚   β”‚  h_[BOS] or mean(h_1:T) β†’ MLP β†’ softmax β†’ class label  β”‚      β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. The Four Embedding Types (Detailed)

3.1 Geohash Embeddings β€” Spatial Position Encoding

Purpose: Encode the aircraft's 3D geographic position as a discrete token.

Method: We use H3 hexagonal hierarchical spatial index (Uber's H3) at resolution 5 (hex area β‰ˆ 252 kmΒ², edge β‰ˆ 9.85 km) for en-route flight, with an option to use resolution 7 (β‰ˆ 5.16 kmΒ², edge β‰ˆ 1.22 km) for terminal areas. This follows the H3-CLM paper's approach but adapted for aviation's larger spatial scale.

3D Extension: Since aircraft operate in 3D, we combine the H3 cell with an altitude band:

Geohash Token = H3_cell_index Γ— N_alt_bands + alt_band_index

Altitude bands (1000 ft increments):
  Band 0:     0 - 1,000 ft    (ground / taxi)
  Band 1:  1,000 - 2,000 ft   (initial climb / approach)
  ...
  Band 45: 44,000 - 45,000 ft (high cruise)

  N_alt_bands = 46

Vocabulary size: At H3 resolution 5, the number of unique cells covering typical airspace is 100K-200K. With altitude bands: `200K Γ— 46 β‰ˆ 9.2M` β€” too large for direct embedding.

Solution β€” Factored Embedding:

E_geohash = E_h3[h3_cell_id] + E_alt[alt_band_id]

E_h3:  learned embedding table, vocab = N_h3_cells (~200K or hashing trick to 50K)
E_alt: learned embedding table, vocab = 46

Both project to d_model dimensions.

The hashing trick: Map H3 cell indices through a hash function to a fixed vocabulary of ~50,000 buckets. This bounds memory while maintaining spatial discrimination.

Why H3 over traditional geohash: H3 hexagons have uniform area (no polar distortion), hierarchical nesting, and consistent neighbor relationships β€” critical for trajectory continuity.

3.2 Temporal Embeddings β€” When Is the Aircraft Flying?

Purpose: Encode temporal context β€” time of day affects traffic density, routes, and behavior.

Method: Additive composition of multiple temporal scales:

E_temporal = E_hour[hour_of_day] + E_dow[day_of_week] + E_month[month]

E_hour:  24 entries  (captures rush hour vs. night patterns)
E_dow:    7 entries  (weekday vs. weekend traffic)
E_month: 12 entries  (seasonal routes, weather patterns)

All project to d_model dimensions.

Optional β€” Sinusoidal Sub-minute Encoding: For sub-minute resolution:

E_minute = sin(2Ο€ Γ— minute / 60), cos(2Ο€ Γ— minute / 60)  β†’ linear β†’ d_model

3.3 Uncertainty Embeddings β€” How Confident Are We?

Purpose: Encode the model's uncertainty about the current trajectory state. Aircraft in straight-and-level cruise have low uncertainty; aircraft maneuvering near airports have high uncertainty.

Method: Compute a trajectory smoothness score from recent states, then discretize:

Uncertainty sources (sliding window of k=5 recent states):

1. Position variance:  σ²_pos = var(Ξ”lat) + var(Ξ”lon)
2. Heading variance:   σ²_COG = circular_var(COG_{t-k:t})
3. Speed variance:     σ²_SOG = var(SOG_{t-k:t})
4. Altitude variance:  σ²_alt = var(alt_rate_{t-k:t})

Combined uncertainty score:
  U_t = w1·σ²_pos + w2·σ²_COG + w3·σ²_SOG + w4·σ²_alt

Discretize into N_uncert = 16 bins (quantile binning on training data)

E_uncertainty = E_uncert_table[bin(U_t)]   β†’  d_model

Weights w1-w4: Hyperparameters tuned on validation data, or learned as part of the model.

During inference: For multi-step prediction, uncertainty can be updated using MC-Dropout or ensemble disagreement.

3.4 Prompt Embeddings β€” Task and Context Metadata

Purpose: Provide metadata context about the flight, analogous to system prompts in LLMs. Enables task conditioning and multi-task learning.

Method: Learnable prompt tokens prepended to the trajectory:

Prompt token vocabulary:
  - Aircraft category:  [HEAVY, LARGE, SMALL, ROTORCRAFT, GLIDER, UAV, UNKNOWN]  (7)
  - Flight phase:       [CLIMB, CRUISE, DESCENT, APPROACH, GROUND, UNKNOWN]       (6)
  - Region:             [CONUS, EUROPE, ASIA, OTHER]                               (4)
  - Task:               [PREDICT, CLASSIFY, DETECT_ANOMALY]                        (3)
  - Special:            [BOS, EOS, PAD, MASK]                                      (4)

Total prompt vocab: ~24 tokens

Prompt sequence (prepended):
  [BOS, TASK_TOKEN, AIRCRAFT_TOKEN, PHASE_TOKEN, REGION_TOKEN]

Each has a learned embedding of dimension d_model.

For downstream classification: Change TASK_TOKEN to CLASSIFY; output at BOS position is used for classification.


4. Feature Derivation Pipeline

4.1 Raw Input

timestamp (Unix epoch seconds)
latitude  (degrees, WGS84)
longitude (degrees, WGS84)
altitude  (feet, barometric or geometric)

4.2 Derived Features

import numpy as np

def derive_features(timestamps, lats, lons, alts):
    """
    Derive COG, SOG, ROT, and altitude rate from raw position data.
    All inputs: numpy arrays of shape (N,) for a single trajectory.
    Returns arrays of shape (N,) β€” first element is NaN.
    """
    dt = np.diff(timestamps)  # seconds
    dt = np.maximum(dt, 1e-6)  # avoid division by zero
    
    # --- Course Over Ground (COG) ---
    lat1, lat2 = np.radians(lats[:-1]), np.radians(lats[1:])
    dlon = np.radians(np.diff(lons))
    
    x = np.sin(dlon) * np.cos(lat2)
    y = np.cos(lat1) * np.sin(lat2) - np.sin(lat1) * np.cos(lat2) * np.cos(dlon)
    COG = np.degrees(np.arctan2(x, y)) % 360  # [0, 360)
    
    # --- Speed Over Ground (SOG) ---
    dlat = np.radians(np.diff(lats))
    a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
    distance_nm = 3440.065 * c  # Earth radius in nautical miles
    SOG = distance_nm / (dt / 3600)  # knots
    
    # --- Rate of Turn (ROT) ---
    dCOG = np.diff(COG)
    dCOG = (dCOG + 180) % 360 - 180  # normalize to [-180, 180]
    ROT = np.full(len(lats), np.nan)
    ROT[2:] = dCOG / dt[1:]  # degrees per second
    
    # --- Rate of Altitude Change ---
    dalt = np.diff(alts)  # feet
    alt_rate = dalt / (dt / 60)  # feet per minute
    
    # Pad first elements
    COG_full = np.concatenate([[np.nan], COG])
    SOG_full = np.concatenate([[np.nan], SOG])
    alt_rate_full = np.concatenate([[np.nan], alt_rate])
    
    return COG_full, SOG_full, ROT, alt_rate_full

4.3 Feature Discretization

Feature Range Bin Width N_bins Notes
COG [0, 360) 5Β° 72 Circular
SOG [0, 600] kts 5 knots 121 Capped at ~Mach 1
ROT [-6, 6] Β°/s 0.25 Β°/s 49 Capped Β±6Β°/s
Altitude Rate [-6000, 6000] fpm 200 ft/min 61 Capped Β±6000 fpm

Outliers beyond caps clipped to boundary bin.

4.4 Trajectory Preprocessing Pipeline

1. Segment raw ADS-B by ICAO24 + temporal gaps > 15 min β†’ individual flights
2. Resample to fixed Ξ”t = 60 seconds (linear interp for position, circular for heading)
3. Derive features (COG, SOG, ROT, alt_rate)
4. Drop first 2 points per trajectory (NaN from derivation)
5. Filter: remove trajectories with < 20 points (< 20 minutes)
6. Compute H3 cell (res 5) + altitude band for each point
7. Discretize all continuous features into bins
8. Compute uncertainty scores (sliding window k=5)
9. Extract temporal features (hour, dow, month)
10. Construct prompt tokens from metadata (if available)

5. Model Hyperparameters

5.1 Model Dimensions

Parameter Value Rationale
d_model 256 H3-CLM found 256-1024 effective
n_heads 8 head_dim = 32
n_layers 8 Moderate depth for ~10M param model
d_ff 1024 4Γ— d_model (standard)
max_seq_len 128 128 states Γ— 60s β‰ˆ 2 hours of flight
n_prompt_tokens 5 [BOS, TASK, AIRCRAFT, PHASE, REGION]
dropout 0.1

Total parameters: ~8-12M (trainable on single GPU in hours)

5.2 Vocabulary Sizes

Embedding Vocab Dim
H3 cells 50,000 256
Altitude bands 46 256
COG bins 72 256
SOG bins 121 256
ROT bins 49 256
Alt rate bins 61 256
Hour of day 24 256
Day of week 7 256
Month 12 256
Uncertainty bins 16 256
Prompt tokens 24 256

5.3 State Token Composition

Each timestep β†’ single state token via additive fusion:

E_state_t = E_h3[h3_id_t] + E_alt_band[alt_band_t]            # Geohash (3D position)
          + E_COG[cog_bin_t] + E_SOG[sog_bin_t]                # Kinematics
          + E_ROT[rot_bin_t] + E_alt_rate[alt_rate_bin_t]       # Dynamics
          + E_hour[hour_t] + E_dow[dow_t] + E_month[month_t]   # Temporal
          + E_uncert[uncert_bin_t]                               # Uncertainty

E_state_t ∈ R^{d_model}

This additive fusion follows BERT (token + segment + position) and TrAISFormer.


6. Training Recipe

6.1 Pretraining: Next-State Prediction (Causal LM)

Objective: Given states 1..T, predict state at T+1 (applied autoregressively at every position).

Loss:

L = Ξ£_{t=1}^{T-1} [ Ξ»_geo Β· CE(Ε·_geo_t, y_geo_{t+1})
                    + Ξ»_COG Β· CE(Ε·_COG_t, y_COG_{t+1})
                    + Ξ»_SOG Β· CE(Ε·_SOG_t, y_SOG_{t+1})
                    + Ξ»_ROT Β· CE(Ε·_ROT_t, y_ROT_{t+1})
                    + Ξ»_alt Β· CE(Ε·_alt_rate_t, y_alt_rate_{t+1})
                    + Ξ»_altb Β· CE(Ε·_alt_band_t, y_alt_band_{t+1}) ]

Ξ» values default to 1.0 (equal weighting).

Training hyperparameters (based on FTP-LLM + H3-CLM):

Parameter Value
Optimizer AdamW
Learning rate 5e-4
LR Schedule Cosine + 5% warmup
Batch size (per GPU) 64
Gradient accumulation 4 (effective = 256)
Max epochs 30 (early stop p=5)
Weight decay 0.01
Gradient clipping 1.0
Mixed precision bf16

Data windowing: Sliding window size=128, stride=64 (50% overlap).

6.2 Downstream: Activity Classification

After pretraining, attach classification head:

h_BOS β†’ Linear(256, 128) β†’ GELU β†’ Dropout(0.1) β†’ Linear(128, N_classes)

Fine-tuning options:

  • A: Freeze backbone, train head only (fast, small data)
  • B: Full fine-tune, backbone lr=1e-5, head lr=1e-3

7. Dataset Strategy

7.1 Prototyping β€” traffic Python Library

from traffic.data.samples import landing_zurich_2019
# ~2,000 flights near Zurich
# Columns: timestamp, icao24, callsign, latitude, longitude, altitude,
#          groundspeed, track, vertical_rate, ...

Instant access, clean, well-documented. Single airport, limited diversity.

7.2 Training β€” OpenSky Network

from pyopensky.trino import Trino
trino = Trino()
df = trino.rawquery("""
    SELECT time, icao24, lat, lon, baroaltitude, velocity, heading, vertrate
    FROM state_vectors_data4
    WHERE hour >= '2024-01-15 00:00:00'
      AND hour <  '2024-01-15 12:00:00'
      AND lat BETWEEN 40 AND 55
      AND lon BETWEEN -10 AND 20
    ORDER BY icao24, time
""")

Target:

  • Region A (train): Europe, 1 month β†’ ~500K-1M flights
  • Region B (OOD test): US CONUS, 1 week β†’ ~200K flights
  • Region C (far test): East Asia, 1 week β†’ ~100K flights

7.3 Alternative: SCAT Dataset

~170K en-route flights over Sweden, Zenodo. Pre-segmented, clean.

7.4 Data Split

Training:    70% of Region A flights
Validation:  15% of Region A flights  
Test (IID):  15% of Region A flights
Test (OOD):  100% of Region B flights
Test (Far):  100% of Region C flights

Split by flight (not time window) to avoid data leakage.


8. Ablation Study: Geohash Geographic Dependency

8.1 Hypothesis

Geohash embeddings encode absolute geographic position, causing the model to memorize region-specific patterns (airways, approach paths, airspace structure). This improves in-distribution performance but degrades transfer to unseen regions.

8.2 Experimental Variants

Variant Geohash Type Description
V1: Full Model H3 absolute Complete architecture as described
V2: No Geohash None Remove geohash entirely; model sees only kinematics + temporal + uncertainty
V3: Relative Geohash H3 relative H3 cell of (Ξ”lat, Ξ”lon) from trajectory start β€” position-invariant
V4: Multi-Resolution H3 res 3+5+7 3 resolutions summed (coarse→fine)
V5: Continuous Position Linear projection Linear([lat, lon, alt] β†’ d_model) β€” no discretization

8.3 Evaluation Metrics

For each variant Γ— each test set (IID, OOD, Far):

Metric Description
Geo Accuracy % correct H3 cell prediction
Position MAE Mean absolute error in km
COG MAE Heading error in degrees
SOG MAE Speed error in knots
Multi-step ADE Average displacement error over 5 predicted steps
Multi-step FDE Final displacement error at step 5

8.4 Key Comparisons

Comparison Tests
V1 vs V2 (IID) How much geohash helps when test = train region
V1 vs V2 (OOD) If V2 > V1 on OOD β†’ geohash causes geographic overfitting
V1 vs V3 (OOD) If V3 good on both IID and OOD β†’ relative geohash is the sweet spot
V4 (all) Multi-resolution: coarse cells transfer, fine cells specialize?
V5 (all) Does continuous encoding avoid discretization issues?

8.5 Expected Outcomes

  • V1: Best IID, worst OOD (hypothesis)
  • V3: Best compromise β€” predicted winner
  • V5: May struggle (loses discrete token structure transformers excel at)
  • V2: Strong OOD baseline, sacrifices IID

8.6 Additional Analysis

  • Attention visualization: V1 vs V3 attention patterns
  • Embedding clustering: t-SNE of geohash embeddings colored by region
  • Learning curves: IID vs OOD performance vs training data size

9. Implementation Phases

Phase 1: Data Pipeline (Week 1)

  • Set up traffic library, extract sample trajectories
  • Implement feature derivation (COG, SOG, ROT, alt_rate)
  • Implement H3 geohash encoding + altitude banding
  • Implement feature discretization (binning)
  • Implement uncertainty score computation
  • Build PyTorch Dataset class with sliding window
  • Unit tests for all derivation functions

Phase 2: Model Architecture (Week 1-2)

  • Implement all embedding tables
  • Implement additive fusion layer
  • Implement prompt token prepending
  • Implement decoder-only transformer backbone
  • Implement multi-head output (6 prediction heads)
  • Implement classification head (for downstream)
  • Forward pass test with dummy data

Phase 3: Pretraining (Week 2-3)

  • Implement training loop with multi-task loss
  • Prototyping run on traffic data (small, fast iteration)
  • Scale to OpenSky data
  • Monitor loss curves, validate convergence
  • Save best checkpoint

Phase 4: Downstream Adaptation (Week 3-4)

  • Implement classification fine-tuning pipeline
  • Test on activity classification task
  • Compare frozen vs. fine-tuned backbone

Phase 5: Ablation Study (Week 4-5)

  • Implement all 5 geohash variants
  • Train each variant with identical hyperparameters
  • Evaluate on IID, OOD, and Far test sets
  • Generate comparison tables and visualizations
  • Write analysis of geographic dependency findings

10. Key Design Decisions & Rationale

Decision Choice Why
Custom model vs. pretrained LLM Custom ~10M param transformer FTP-LLM showed text-tokenized LLMs work, but custom allows proper multi-feature fusion. 10M params trains in hours.
H3 vs. traditional geohash H3 Uniform hexagonal cells, no polar distortion, hierarchical. Proven by H3-CLM.
Additive vs. concatenative fusion Additive BERT/TrAISFormer paradigm. Keeps d_model constant. Concatenation β†’ d_model Γ— N_features = massive.
60s time resolution 60 seconds FTP-LLM validated 1-min aggregation. 128 steps β‰ˆ 2+ hours.
Factored geohash (H3 + alt) Separate tables, summed Avoids combinatorial explosion (9.2M β†’ 50K + 46).
Multi-head output Separate softmax per feature More interpretable, allows per-feature analysis.
Uncertainty from smoothness Variance-based Computable at data time, no inference overhead.

11. Risk Analysis

Risk Likelihood Impact Mitigation
Geohash overfits to region High High Ablation study; V3 (relative) is fallback
OpenSky access issues Medium High Fallback: traffic samples + SCAT
60s too coarse for terminal Medium Low Separate terminal model at 10s
Model too small Low Medium Scale: d_model→512, n_layers→16 (~40M)
Alt discretization too coarse Low Low Refine to 500ft bands (92)

12. Monitoring & Evaluation

During training (Trackio):

  • Total loss + per-feature loss curves
  • Validation loss each epoch
  • LR schedule, GPU utilization

After training:

  • Next-state accuracy (top-1, top-5 per feature)
  • Position error in km
  • Multi-step prediction (1, 5, 10, 20 steps ahead)
  • Downstream classification F1/precision/recall

Grounded in: FTP-LLM, H3-CLM, GeoFormer, TrAISFormer, and LLM4STP (reconstructed). Ready for implementation upon approval.