| # AirTrackLM: LLM4STP Adapted for ADS-B Air Track Prediction |
|
|
| ## Complete Architecture & Implementation Plan |
|
|
| --- |
|
|
| ## 1. Executive Summary |
|
|
| We adapt the LLM4STP multi-feature fusion architecture (originally for maritime AIS ship trajectory prediction) to work with **ADS-B air track data**. The model uses a **decoder-only transformer** with four specialized embedding types β Prompt, Uncertainty, Geohash, and Temporal β fused together for **next-state prediction** pretraining. Once pretrained, the model is adaptable to downstream tasks like activity classification. |
|
|
| This design is grounded in published results from: |
| - **FTP-LLM** (arXiv:2501.17459) β LLaMA-3.1-8B for flight trajectory prediction |
| - **H3-CLM** (arXiv:2405.09596) β H3 geohash + causal LM for maritime trajectories |
| - **GeoFormer** (arXiv:2311.05092) β GPT-style geospatial tokenization |
| - **TrAISFormer** (arXiv:2109.03958) β Discrete tokenization of AIS features |
|
|
| --- |
|
|
| ## 2. System Architecture Overview |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β RAW ADS-B INPUT β |
| β (timestamp, latitude, longitude, altitude) β |
| βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β FEATURE DERIVATION PIPELINE β |
| β β |
| β Raw: lat, lon, alt β |
| β Derived: COG, SOG, ROT, altitude_rate β |
| β Meta: timestamp β (hour, day_of_week, month) β |
| β β |
| β Output per timestep: β |
| β state_t = [lat, lon, alt, COG, SOG, ROT, alt_rate] β |
| βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β TOKENIZATION / ENCODING β |
| β β |
| β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β |
| β β Geohash β β Continuous β β Temporal β β |
| β β Tokenizer β β Discretizer β β Encoder β β |
| β β β β β β β β |
| β β lat,lon,alt β β COG,SOG,ROT β β hour,dow, β β |
| β β β H3 cell + β β alt_rate β β month β β |
| β β alt_band β β β bin IDs β β β time IDs β β |
| β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β |
| β β β β β |
| β βΌ βΌ βΌ β |
| β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β |
| β β Geohash β β Feature β β Temporal β β |
| β β Embedding β β Embeddings β β Embedding β β |
| β β Table β β Tables β β Table β β |
| β β (d_model) β β (d_model) β β (d_model) β β |
| β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β |
| β β β β β |
| ββββββββββββΌββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββ |
| β β β |
| βΌ βΌ βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β EMBEDDING FUSION LAYER β |
| β β |
| β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββββ β |
| β β Geohash β β Feature β β Temporal β β Uncertainty β β |
| β β Embed β β Embed β β Embed β β Embed β β |
| β β (d_model) β β (d_model) β β (d_model) β β (d_model) β β |
| β βββββββ¬βββββββ βββββββ¬βββββββ βββββββ¬βββββββ ββββββββ¬ββββββββ β |
| β β β β β β |
| β ββββββββββββ¬ββββ΄βββββββ¬ββββββββ β β |
| β β β β β |
| β βΌ βΌ βΌ β |
| β E_state = E_geo + E_feat + E_temp + E_uncert β |
| β β β |
| β βΌ β |
| β βββββββββββββββββββββββββββββββββββββββββββββ β |
| β β Prompt Embedding (prepended prefix) β β |
| β β [PROMPT_1, PROMPT_2, ..., PROMPT_k] β β |
| β βββββββββββββββββββββ¬ββββββββββββββββββββββββ β |
| β β β |
| β βΌ β |
| β Input: [PROMPT_TOKENS | STATE_1 | STATE_2 | ... | STATE_T] β |
| β β β |
| β βΌ β |
| β Linear Projection β d_model β |
| β β β |
| β βΌ β |
| β + Positional Encoding (sinusoidal) β |
| β β |
| βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β DECODER-ONLY TRANSFORMER BACKBONE β |
| β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β Transformer Block ΓN_layers β β |
| β β β β |
| β β βββββββββββββββββββββββββββββββββββββββββββ β β |
| β β β Causal Multi-Head Self-Attention β β β |
| β β β (masked: each position attends only β β β |
| β β β to itself and earlier positions) β β β |
| β β ββββββββββββββββββββ¬βββββββββββββββββββββββ β β |
| β β β β β |
| β β βΌ β β |
| β β βββββββββββββββββββββββββββββββββββββββββββ β β |
| β β β LayerNorm + Residual Connection β β β |
| β β ββββββββββββββββββββ¬βββββββββββββββββββββββ β β |
| β β β β β |
| β β βΌ β β |
| β β βββββββββββββββββββββββββββββββββββββββββββ β β |
| β β β Feed-Forward Network β β β |
| β β β (Linear β GELU β Linear) β β β |
| β β β d_model β 4*d_model β d_model β β β |
| β β ββββββββββββββββββββ¬βββββββββββββββββββββββ β β |
| β β β β β |
| β β βΌ β β |
| β β βββββββββββββββββββββββββββββββββββββββββββ β β |
| β β β LayerNorm + Residual Connection β β β |
| β β βββββββββββββββββββββββββββββββββββββββββββ β β |
| β β β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββ β β |
| β β |
| βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β OUTPUT HEADS β |
| β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β PRETRAINING: Next-State Prediction Head β β |
| β β β β |
| β β For each position t, predict state at t+1: β β |
| β β β β |
| β β h_t β Linear β softmax β P(geohash_token_{t+1}) β β |
| β β h_t β Linear β softmax β P(COG_bin_{t+1}) β β |
| β β h_t β Linear β softmax β P(SOG_bin_{t+1}) β β |
| β β h_t β Linear β softmax β P(ROT_bin_{t+1}) β β |
| β β h_t β Linear β softmax β P(alt_rate_bin_{t+1}) β β |
| β β h_t β Linear β softmax β P(alt_band_{t+1}) β β |
| β β β β |
| β β Loss = Ξ£ CrossEntropy(predicted_feature, true_feature) β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β DOWNSTREAM: Activity Classification Head β β |
| β β (attached after pretraining, frozen or fine-tuned) β β |
| β β β β |
| β β h_[BOS] or mean(h_1:T) β MLP β softmax β class label β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| --- |
|
|
| ## 3. The Four Embedding Types (Detailed) |
|
|
| ### 3.1 Geohash Embeddings β Spatial Position Encoding |
|
|
| **Purpose**: Encode the aircraft's 3D geographic position as a discrete token. |
|
|
| **Method**: We use **H3 hexagonal hierarchical spatial index** (Uber's H3) at resolution 5 (hex area β 252 kmΒ², edge β 9.85 km) for en-route flight, with an option to use resolution 7 (β 5.16 kmΒ², edge β 1.22 km) for terminal areas. This follows the H3-CLM paper's approach but adapted for aviation's larger spatial scale. |
|
|
| **3D Extension**: Since aircraft operate in 3D, we combine the H3 cell with an **altitude band**: |
| ``` |
| Geohash Token = H3_cell_index Γ N_alt_bands + alt_band_index |
| |
| Altitude bands (1000 ft increments): |
| Band 0: 0 - 1,000 ft (ground / taxi) |
| Band 1: 1,000 - 2,000 ft (initial climb / approach) |
| ... |
| Band 45: 44,000 - 45,000 ft (high cruise) |
| |
| N_alt_bands = 46 |
| ``` |
|
|
| **Vocabulary size**: At H3 resolution 5, the number of unique cells covering typical airspace is ~100K-200K. With altitude bands: `~200K Γ 46 β 9.2M` β too large for direct embedding. |
|
|
| **Solution β Factored Embedding**: |
| ``` |
| E_geohash = E_h3[h3_cell_id] + E_alt[alt_band_id] |
| |
| E_h3: learned embedding table, vocab = N_h3_cells (~200K or hashing trick to 50K) |
| E_alt: learned embedding table, vocab = 46 |
| |
| Both project to d_model dimensions. |
| ``` |
|
|
| The **hashing trick**: Map H3 cell indices through a hash function to a fixed vocabulary of ~50,000 buckets. This bounds memory while maintaining spatial discrimination. |
|
|
| **Why H3 over traditional geohash**: H3 hexagons have uniform area (no polar distortion), hierarchical nesting, and consistent neighbor relationships β critical for trajectory continuity. |
|
|
| ### 3.2 Temporal Embeddings β When Is the Aircraft Flying? |
|
|
| **Purpose**: Encode temporal context β time of day affects traffic density, routes, and behavior. |
|
|
| **Method**: Additive composition of multiple temporal scales: |
| ``` |
| E_temporal = E_hour[hour_of_day] + E_dow[day_of_week] + E_month[month] |
| |
| E_hour: 24 entries (captures rush hour vs. night patterns) |
| E_dow: 7 entries (weekday vs. weekend traffic) |
| E_month: 12 entries (seasonal routes, weather patterns) |
| |
| All project to d_model dimensions. |
| ``` |
|
|
| **Optional β Sinusoidal Sub-minute Encoding**: For sub-minute resolution: |
| ``` |
| E_minute = sin(2Ο Γ minute / 60), cos(2Ο Γ minute / 60) β linear β d_model |
| ``` |
|
|
| ### 3.3 Uncertainty Embeddings β How Confident Are We? |
|
|
| **Purpose**: Encode the model's uncertainty about the current trajectory state. Aircraft in straight-and-level cruise have low uncertainty; aircraft maneuvering near airports have high uncertainty. |
|
|
| **Method**: Compute a **trajectory smoothness score** from recent states, then discretize: |
|
|
| ``` |
| Uncertainty sources (sliding window of k=5 recent states): |
| |
| 1. Position variance: ΟΒ²_pos = var(Ξlat) + var(Ξlon) |
| 2. Heading variance: ΟΒ²_COG = circular_var(COG_{t-k:t}) |
| 3. Speed variance: ΟΒ²_SOG = var(SOG_{t-k:t}) |
| 4. Altitude variance: ΟΒ²_alt = var(alt_rate_{t-k:t}) |
| |
| Combined uncertainty score: |
| U_t = w1Β·ΟΒ²_pos + w2Β·ΟΒ²_COG + w3Β·ΟΒ²_SOG + w4Β·ΟΒ²_alt |
| |
| Discretize into N_uncert = 16 bins (quantile binning on training data) |
| |
| E_uncertainty = E_uncert_table[bin(U_t)] β d_model |
| ``` |
|
|
| **Weights w1-w4**: Hyperparameters tuned on validation data, or learned as part of the model. |
|
|
| **During inference**: For multi-step prediction, uncertainty can be updated using MC-Dropout or ensemble disagreement. |
|
|
| ### 3.4 Prompt Embeddings β Task and Context Metadata |
|
|
| **Purpose**: Provide metadata context about the flight, analogous to system prompts in LLMs. Enables task conditioning and multi-task learning. |
|
|
| **Method**: Learnable prompt tokens prepended to the trajectory: |
|
|
| ``` |
| Prompt token vocabulary: |
| - Aircraft category: [HEAVY, LARGE, SMALL, ROTORCRAFT, GLIDER, UAV, UNKNOWN] (7) |
| - Flight phase: [CLIMB, CRUISE, DESCENT, APPROACH, GROUND, UNKNOWN] (6) |
| - Region: [CONUS, EUROPE, ASIA, OTHER] (4) |
| - Task: [PREDICT, CLASSIFY, DETECT_ANOMALY] (3) |
| - Special: [BOS, EOS, PAD, MASK] (4) |
| |
| Total prompt vocab: ~24 tokens |
| |
| Prompt sequence (prepended): |
| [BOS, TASK_TOKEN, AIRCRAFT_TOKEN, PHASE_TOKEN, REGION_TOKEN] |
| |
| Each has a learned embedding of dimension d_model. |
| ``` |
|
|
| **For downstream classification**: Change TASK_TOKEN to CLASSIFY; output at BOS position is used for classification. |
| |
| --- |
| |
| ## 4. Feature Derivation Pipeline |
| |
| ### 4.1 Raw Input |
| ``` |
| timestamp (Unix epoch seconds) |
| latitude (degrees, WGS84) |
| longitude (degrees, WGS84) |
| altitude (feet, barometric or geometric) |
| ``` |
| |
| ### 4.2 Derived Features |
| |
| ```python |
| import numpy as np |
| |
| def derive_features(timestamps, lats, lons, alts): |
| """ |
| Derive COG, SOG, ROT, and altitude rate from raw position data. |
| All inputs: numpy arrays of shape (N,) for a single trajectory. |
| Returns arrays of shape (N,) β first element is NaN. |
| """ |
| dt = np.diff(timestamps) # seconds |
| dt = np.maximum(dt, 1e-6) # avoid division by zero |
| |
| # --- Course Over Ground (COG) --- |
| lat1, lat2 = np.radians(lats[:-1]), np.radians(lats[1:]) |
| dlon = np.radians(np.diff(lons)) |
| |
| x = np.sin(dlon) * np.cos(lat2) |
| y = np.cos(lat1) * np.sin(lat2) - np.sin(lat1) * np.cos(lat2) * np.cos(dlon) |
| COG = np.degrees(np.arctan2(x, y)) % 360 # [0, 360) |
| |
| # --- Speed Over Ground (SOG) --- |
| dlat = np.radians(np.diff(lats)) |
| a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2 |
| c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a)) |
| distance_nm = 3440.065 * c # Earth radius in nautical miles |
| SOG = distance_nm / (dt / 3600) # knots |
| |
| # --- Rate of Turn (ROT) --- |
| dCOG = np.diff(COG) |
| dCOG = (dCOG + 180) % 360 - 180 # normalize to [-180, 180] |
| ROT = np.full(len(lats), np.nan) |
| ROT[2:] = dCOG / dt[1:] # degrees per second |
| |
| # --- Rate of Altitude Change --- |
| dalt = np.diff(alts) # feet |
| alt_rate = dalt / (dt / 60) # feet per minute |
| |
| # Pad first elements |
| COG_full = np.concatenate([[np.nan], COG]) |
| SOG_full = np.concatenate([[np.nan], SOG]) |
| alt_rate_full = np.concatenate([[np.nan], alt_rate]) |
| |
| return COG_full, SOG_full, ROT, alt_rate_full |
| ``` |
| |
| ### 4.3 Feature Discretization |
|
|
| | Feature | Range | Bin Width | N_bins | Notes | |
| |---------------|-------------------|--------------|--------|--------------------| |
| | COG | [0, 360) | 5Β° | 72 | Circular | |
| | SOG | [0, 600] kts | 5 knots | 121 | Capped at ~Mach 1 | |
| | ROT | [-6, 6] Β°/s | 0.25 Β°/s | 49 | Capped Β±6Β°/s | |
| | Altitude Rate | [-6000, 6000] fpm | 200 ft/min | 61 | Capped Β±6000 fpm | |
| |
| Outliers beyond caps clipped to boundary bin. |
| |
| ### 4.4 Trajectory Preprocessing Pipeline |
| |
| ``` |
| 1. Segment raw ADS-B by ICAO24 + temporal gaps > 15 min β individual flights |
| 2. Resample to fixed Ξt = 60 seconds (linear interp for position, circular for heading) |
| 3. Derive features (COG, SOG, ROT, alt_rate) |
| 4. Drop first 2 points per trajectory (NaN from derivation) |
| 5. Filter: remove trajectories with < 20 points (< 20 minutes) |
| 6. Compute H3 cell (res 5) + altitude band for each point |
| 7. Discretize all continuous features into bins |
| 8. Compute uncertainty scores (sliding window k=5) |
| 9. Extract temporal features (hour, dow, month) |
| 10. Construct prompt tokens from metadata (if available) |
| ``` |
| |
| --- |
| |
| ## 5. Model Hyperparameters |
| |
| ### 5.1 Model Dimensions |
| |
| | Parameter | Value | Rationale | |
| |------------------|--------|----------------------------------------------------| |
| | d_model | 256 | H3-CLM found 256-1024 effective | |
| | n_heads | 8 | head_dim = 32 | |
| | n_layers | 8 | Moderate depth for ~10M param model | |
| | d_ff | 1024 | 4Γ d_model (standard) | |
| | max_seq_len | 128 | 128 states Γ 60s β 2 hours of flight | |
| | n_prompt_tokens | 5 | [BOS, TASK, AIRCRAFT, PHASE, REGION] | |
| | dropout | 0.1 | | |
| |
| **Total parameters**: ~8-12M (trainable on single GPU in hours) |
| |
| ### 5.2 Vocabulary Sizes |
| |
| | Embedding | Vocab | Dim | |
| |------------------|--------|-----| |
| | H3 cells | 50,000 | 256 | |
| | Altitude bands | 46 | 256 | |
| | COG bins | 72 | 256 | |
| | SOG bins | 121 | 256 | |
| | ROT bins | 49 | 256 | |
| | Alt rate bins | 61 | 256 | |
| | Hour of day | 24 | 256 | |
| | Day of week | 7 | 256 | |
| | Month | 12 | 256 | |
| | Uncertainty bins | 16 | 256 | |
| | Prompt tokens | 24 | 256 | |
| |
| ### 5.3 State Token Composition |
| |
| Each timestep β single state token via additive fusion: |
| |
| ``` |
| E_state_t = E_h3[h3_id_t] + E_alt_band[alt_band_t] # Geohash (3D position) |
| + E_COG[cog_bin_t] + E_SOG[sog_bin_t] # Kinematics |
| + E_ROT[rot_bin_t] + E_alt_rate[alt_rate_bin_t] # Dynamics |
| + E_hour[hour_t] + E_dow[dow_t] + E_month[month_t] # Temporal |
| + E_uncert[uncert_bin_t] # Uncertainty |
|
|
| E_state_t β R^{d_model} |
| ``` |
| |
| This additive fusion follows BERT (token + segment + position) and TrAISFormer. |
| |
| --- |
| |
| ## 6. Training Recipe |
| |
| ### 6.1 Pretraining: Next-State Prediction (Causal LM) |
| |
| **Objective**: Given states 1..T, predict state at T+1 (applied autoregressively at every position). |
| |
| **Loss**: |
| ``` |
| L = Ξ£_{t=1}^{T-1} [ Ξ»_geo Β· CE(Ε·_geo_t, y_geo_{t+1}) |
| + Ξ»_COG Β· CE(Ε·_COG_t, y_COG_{t+1}) |
| + Ξ»_SOG Β· CE(Ε·_SOG_t, y_SOG_{t+1}) |
| + Ξ»_ROT Β· CE(Ε·_ROT_t, y_ROT_{t+1}) |
| + Ξ»_alt Β· CE(Ε·_alt_rate_t, y_alt_rate_{t+1}) |
| + Ξ»_altb Β· CE(Ε·_alt_band_t, y_alt_band_{t+1}) ] |
|
|
| Ξ» values default to 1.0 (equal weighting). |
| ``` |
| |
| **Training hyperparameters** (based on FTP-LLM + H3-CLM): |
| |
| | Parameter | Value | |
| |----------------------|---------------------| |
| | Optimizer | AdamW | |
| | Learning rate | 5e-4 | |
| | LR Schedule | Cosine + 5% warmup | |
| | Batch size (per GPU) | 64 | |
| | Gradient accumulation| 4 (effective = 256) | |
| | Max epochs | 30 (early stop p=5) | |
| | Weight decay | 0.01 | |
| | Gradient clipping | 1.0 | |
| | Mixed precision | bf16 | |
| |
| **Data windowing**: Sliding window size=128, stride=64 (50% overlap). |
| |
| ### 6.2 Downstream: Activity Classification |
| |
| After pretraining, attach classification head: |
| ``` |
| h_BOS β Linear(256, 128) β GELU β Dropout(0.1) β Linear(128, N_classes) |
| ``` |
| |
| **Fine-tuning options**: |
| - **A**: Freeze backbone, train head only (fast, small data) |
| - **B**: Full fine-tune, backbone lr=1e-5, head lr=1e-3 |
| |
| --- |
| |
| ## 7. Dataset Strategy |
| |
| ### 7.1 Prototyping β `traffic` Python Library |
| |
| ```python |
| from traffic.data.samples import landing_zurich_2019 |
| # ~2,000 flights near Zurich |
| # Columns: timestamp, icao24, callsign, latitude, longitude, altitude, |
| # groundspeed, track, vertical_rate, ... |
| ``` |
| |
| Instant access, clean, well-documented. Single airport, limited diversity. |
| |
| ### 7.2 Training β OpenSky Network |
| |
| ```python |
| from pyopensky.trino import Trino |
| trino = Trino() |
| df = trino.rawquery(""" |
| SELECT time, icao24, lat, lon, baroaltitude, velocity, heading, vertrate |
| FROM state_vectors_data4 |
| WHERE hour >= '2024-01-15 00:00:00' |
| AND hour < '2024-01-15 12:00:00' |
| AND lat BETWEEN 40 AND 55 |
| AND lon BETWEEN -10 AND 20 |
| ORDER BY icao24, time |
| """) |
| ``` |
| |
| **Target**: |
| - **Region A** (train): Europe, 1 month β ~500K-1M flights |
| - **Region B** (OOD test): US CONUS, 1 week β ~200K flights |
| - **Region C** (far test): East Asia, 1 week β ~100K flights |
| |
| ### 7.3 Alternative: SCAT Dataset |
| |
| ~170K en-route flights over Sweden, Zenodo. Pre-segmented, clean. |
| |
| ### 7.4 Data Split |
| |
| ``` |
| Training: 70% of Region A flights |
| Validation: 15% of Region A flights |
| Test (IID): 15% of Region A flights |
| Test (OOD): 100% of Region B flights |
| Test (Far): 100% of Region C flights |
| ``` |
| |
| Split by **flight** (not time window) to avoid data leakage. |
| |
| --- |
| |
| ## 8. Ablation Study: Geohash Geographic Dependency |
| |
| ### 8.1 Hypothesis |
| |
| > Geohash embeddings encode **absolute geographic position**, causing the model to memorize region-specific patterns (airways, approach paths, airspace structure). This improves in-distribution performance but degrades transfer to unseen regions. |
| |
| ### 8.2 Experimental Variants |
| |
| | Variant | Geohash Type | Description | |
| |---------|-------------|-------------| |
| | **V1: Full Model** | H3 absolute | Complete architecture as described | |
| | **V2: No Geohash** | None | Remove geohash entirely; model sees only kinematics + temporal + uncertainty | |
| | **V3: Relative Geohash** | H3 relative | H3 cell of (Ξlat, Ξlon) from trajectory start β position-invariant | |
| | **V4: Multi-Resolution** | H3 res 3+5+7 | 3 resolutions summed (coarseβfine) | |
| | **V5: Continuous Position** | Linear projection | `Linear([lat, lon, alt] β d_model)` β no discretization | |
|
|
| ### 8.3 Evaluation Metrics |
|
|
| For each variant Γ each test set (IID, OOD, Far): |
|
|
| | Metric | Description | |
| |--------|-------------| |
| | Geo Accuracy | % correct H3 cell prediction | |
| | Position MAE | Mean absolute error in km | |
| | COG MAE | Heading error in degrees | |
| | SOG MAE | Speed error in knots | |
| | Multi-step ADE | Average displacement error over 5 predicted steps | |
| | Multi-step FDE | Final displacement error at step 5 | |
|
|
| ### 8.4 Key Comparisons |
|
|
| | Comparison | Tests | |
| |-----------|-------| |
| | V1 vs V2 (IID) | How much geohash helps when test = train region | |
| | V1 vs V2 (OOD) | If V2 > V1 on OOD β geohash causes geographic overfitting | |
| | V1 vs V3 (OOD) | If V3 good on both IID and OOD β relative geohash is the sweet spot | |
| | V4 (all) | Multi-resolution: coarse cells transfer, fine cells specialize? | |
| | V5 (all) | Does continuous encoding avoid discretization issues? | |
|
|
| ### 8.5 Expected Outcomes |
|
|
| - **V1**: Best IID, worst OOD (hypothesis) |
| - **V3**: Best compromise β predicted winner |
| - **V5**: May struggle (loses discrete token structure transformers excel at) |
| - **V2**: Strong OOD baseline, sacrifices IID |
|
|
| ### 8.6 Additional Analysis |
|
|
| - **Attention visualization**: V1 vs V3 attention patterns |
| - **Embedding clustering**: t-SNE of geohash embeddings colored by region |
| - **Learning curves**: IID vs OOD performance vs training data size |
|
|
| --- |
|
|
| ## 9. Implementation Phases |
|
|
| ### Phase 1: Data Pipeline (Week 1) |
| - Set up `traffic` library, extract sample trajectories |
| - Implement feature derivation (COG, SOG, ROT, alt_rate) |
| - Implement H3 geohash encoding + altitude banding |
| - Implement feature discretization (binning) |
| - Implement uncertainty score computation |
| - Build PyTorch Dataset class with sliding window |
| - Unit tests for all derivation functions |
| |
| ### Phase 2: Model Architecture (Week 1-2) |
| - Implement all embedding tables |
| - Implement additive fusion layer |
| - Implement prompt token prepending |
| - Implement decoder-only transformer backbone |
| - Implement multi-head output (6 prediction heads) |
| - Implement classification head (for downstream) |
| - Forward pass test with dummy data |
| |
| ### Phase 3: Pretraining (Week 2-3) |
| - Implement training loop with multi-task loss |
| - Prototyping run on `traffic` data (small, fast iteration) |
| - Scale to OpenSky data |
| - Monitor loss curves, validate convergence |
| - Save best checkpoint |
| |
| ### Phase 4: Downstream Adaptation (Week 3-4) |
| - Implement classification fine-tuning pipeline |
| - Test on activity classification task |
| - Compare frozen vs. fine-tuned backbone |
| |
| ### Phase 5: Ablation Study (Week 4-5) |
| - Implement all 5 geohash variants |
| - Train each variant with identical hyperparameters |
| - Evaluate on IID, OOD, and Far test sets |
| - Generate comparison tables and visualizations |
| - Write analysis of geographic dependency findings |
| |
| --- |
| |
| ## 10. Key Design Decisions & Rationale |
| |
| | Decision | Choice | Why | |
| |----------|--------|-----| |
| | Custom model vs. pretrained LLM | Custom ~10M param transformer | FTP-LLM showed text-tokenized LLMs work, but custom allows proper multi-feature fusion. 10M params trains in hours. | |
| | H3 vs. traditional geohash | H3 | Uniform hexagonal cells, no polar distortion, hierarchical. Proven by H3-CLM. | |
| | Additive vs. concatenative fusion | Additive | BERT/TrAISFormer paradigm. Keeps d_model constant. Concatenation β d_model Γ N_features = massive. | |
| | 60s time resolution | 60 seconds | FTP-LLM validated 1-min aggregation. 128 steps β 2+ hours. | |
| | Factored geohash (H3 + alt) | Separate tables, summed | Avoids combinatorial explosion (9.2M β 50K + 46). | |
| | Multi-head output | Separate softmax per feature | More interpretable, allows per-feature analysis. | |
| | Uncertainty from smoothness | Variance-based | Computable at data time, no inference overhead. | |
|
|
| --- |
|
|
| ## 11. Risk Analysis |
|
|
| | Risk | Likelihood | Impact | Mitigation | |
| |------|-----------|--------|------------| |
| | Geohash overfits to region | High | High | Ablation study; V3 (relative) is fallback | |
| | OpenSky access issues | Medium | High | Fallback: `traffic` samples + SCAT | |
| | 60s too coarse for terminal | Medium | Low | Separate terminal model at 10s | |
| | Model too small | Low | Medium | Scale: d_modelβ512, n_layersβ16 (~40M) | |
| | Alt discretization too coarse | Low | Low | Refine to 500ft bands (92) | |
|
|
| --- |
|
|
| ## 12. Monitoring & Evaluation |
|
|
| **During training** (Trackio): |
| - Total loss + per-feature loss curves |
| - Validation loss each epoch |
| - LR schedule, GPU utilization |
|
|
| **After training**: |
| - Next-state accuracy (top-1, top-5 per feature) |
| - Position error in km |
| - Multi-step prediction (1, 5, 10, 20 steps ahead) |
| - Downstream classification F1/precision/recall |
|
|
| --- |
|
|
| *Grounded in: FTP-LLM, H3-CLM, GeoFormer, TrAISFormer, and LLM4STP (reconstructed). Ready for implementation upon approval.* |
|
|