AirTrackLM / ARCHITECTURE.md

Add ARCHITECTURE.md

e43dca4 verified 13 days ago

preview code

raw

history blame contribute delete

33.1 kB

AirTrackLM: LLM4STP Adapted for ADS-B Air Track Prediction

Complete Architecture & Implementation Plan

1. Executive Summary

We adapt the LLM4STP multi-feature fusion architecture (originally for maritime AIS ship trajectory prediction) to work with ADS-B air track data. The model uses a decoder-only transformer with four specialized embedding types — Prompt, Uncertainty, Geohash, and Temporal — fused together for next-state prediction pretraining. Once pretrained, the model is adaptable to downstream tasks like activity classification.

This design is grounded in published results from:

FTP-LLM (arXiv:2501.17459) — LLaMA-3.1-8B for flight trajectory prediction
H3-CLM (arXiv:2405.09596) — H3 geohash + causal LM for maritime trajectories
GeoFormer (arXiv:2311.05092) — GPT-style geospatial tokenization
TrAISFormer (arXiv:2109.03958) — Discrete tokenization of AIS features

2. System Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                        RAW ADS-B INPUT                              │
│              (timestamp, latitude, longitude, altitude)             │
└─────────────────────────┬───────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│                   FEATURE DERIVATION PIPELINE                       │
│                                                                     │
│   Raw:     lat, lon, alt                                           │
│   Derived: COG, SOG, ROT, altitude_rate                            │
│   Meta:    timestamp → (hour, day_of_week, month)                  │
│                                                                     │
│   Output per timestep:                                              │
│   state_t = [lat, lon, alt, COG, SOG, ROT, alt_rate]              │
└─────────────────────────┬───────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    TOKENIZATION / ENCODING                          │
│                                                                     │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐            │
│   │   Geohash     │  │  Continuous   │  │   Temporal   │            │
│   │  Tokenizer    │  │  Discretizer  │  │   Encoder    │            │
│   │              │  │              │  │              │            │
│   │ lat,lon,alt  │  │ COG,SOG,ROT  │  │ hour,dow,    │            │
│   │ → H3 cell + │  │ alt_rate     │  │ month        │            │
│   │   alt_band   │  │ → bin IDs    │  │ → time IDs   │            │
│   └──────┬───────┘  └──────┬───────┘  └──────┬───────┘            │
│          │                 │                 │                      │
│          ▼                 ▼                 ▼                      │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐            │
│   │  Geohash     │  │  Feature     │  │  Temporal    │            │
│   │  Embedding   │  │  Embeddings  │  │  Embedding   │            │
│   │  Table       │  │  Tables      │  │  Table       │            │
│   │  (d_model)   │  │  (d_model)   │  │  (d_model)   │            │
│   └──────┬───────┘  └──────┬───────┘  └──────┬───────┘            │
│          │                 │                 │                      │
└──────────┼─────────────────┼─────────────────┼──────────────────────┘
           │                 │                 │
           ▼                 ▼                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    EMBEDDING FUSION LAYER                            │
│                                                                     │
│   ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────────┐   │
│   │  Geohash   │ │  Feature   │ │  Temporal  │ │ Uncertainty  │   │
│   │  Embed     │ │  Embed     │ │  Embed     │ │   Embed      │   │
│   │  (d_model) │ │  (d_model) │ │  (d_model) │ │  (d_model)   │   │
│   └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ └──────┬───────┘   │
│         │              │              │               │            │
│         └──────────┬───┴──────┬───────┘               │            │
│                    │          │                        │            │
│                    ▼          ▼                        ▼            │
│              E_state = E_geo + E_feat + E_temp + E_uncert           │
│                              │                                      │
│                              ▼                                      │
│   ┌───────────────────────────────────────────┐                    │
│   │  Prompt Embedding (prepended prefix)      │                    │
│   │  [PROMPT_1, PROMPT_2, ..., PROMPT_k]      │                    │
│   └───────────────────┬───────────────────────┘                    │
│                       │                                             │
│                       ▼                                             │
│   Input: [PROMPT_TOKENS | STATE_1 | STATE_2 | ... | STATE_T]      │
│                       │                                             │
│                       ▼                                             │
│              Linear Projection → d_model                           │
│                       │                                             │
│                       ▼                                             │
│              + Positional Encoding (sinusoidal)                    │
│                                                                     │
└───────────────────────┬─────────────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────────────┐
│              DECODER-ONLY TRANSFORMER BACKBONE                      │
│                                                                     │
│   ┌─────────────────────────────────────────────────────┐          │
│   │  Transformer Block ×N_layers                        │          │
│   │                                                     │          │
│   │  ┌─────────────────────────────────────────┐       │          │
│   │  │  Causal Multi-Head Self-Attention        │       │          │
│   │  │  (masked: each position attends only     │       │          │
│   │  │   to itself and earlier positions)        │       │          │
│   │  └──────────────────┬──────────────────────┘       │          │
│   │                     │                               │          │
│   │                     ▼                               │          │
│   │  ┌─────────────────────────────────────────┐       │          │
│   │  │  LayerNorm + Residual Connection         │       │          │
│   │  └──────────────────┬──────────────────────┘       │          │
│   │                     │                               │          │
│   │                     ▼                               │          │
│   │  ┌─────────────────────────────────────────┐       │          │
│   │  │  Feed-Forward Network                    │       │          │
│   │  │  (Linear → GELU → Linear)               │       │          │
│   │  │  d_model → 4*d_model → d_model          │       │          │
│   │  └──────────────────┬──────────────────────┘       │          │
│   │                     │                               │          │
│   │                     ▼                               │          │
│   │  ┌─────────────────────────────────────────┐       │          │
│   │  │  LayerNorm + Residual Connection         │       │          │
│   │  └─────────────────────────────────────────┘       │          │
│   │                                                     │          │
│   └─────────────────────────────────────────────┘      │          │
│                                                                     │
└───────────────────────┬─────────────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      OUTPUT HEADS                                   │
│                                                                     │
│   ┌─────────────────────────────────────────────────────────┐      │
│   │  PRETRAINING: Next-State Prediction Head                │      │
│   │                                                         │      │
│   │  For each position t, predict state at t+1:             │      │
│   │                                                         │      │
│   │  h_t → Linear → softmax → P(geohash_token_{t+1})       │      │
│   │  h_t → Linear → softmax → P(COG_bin_{t+1})             │      │
│   │  h_t → Linear → softmax → P(SOG_bin_{t+1})             │      │
│   │  h_t → Linear → softmax → P(ROT_bin_{t+1})             │      │
│   │  h_t → Linear → softmax → P(alt_rate_bin_{t+1})        │      │
│   │  h_t → Linear → softmax → P(alt_band_{t+1})            │      │
│   │                                                         │      │
│   │  Loss = Σ CrossEntropy(predicted_feature, true_feature) │      │
│   └─────────────────────────────────────────────────────────┘      │
│                                                                     │
│   ┌─────────────────────────────────────────────────────────┐      │
│   │  DOWNSTREAM: Activity Classification Head               │      │
│   │  (attached after pretraining, frozen or fine-tuned)     │      │
│   │                                                         │      │
│   │  h_[BOS] or mean(h_1:T) → MLP → softmax → class label  │      │
│   └─────────────────────────────────────────────────────────┘      │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

3. The Four Embedding Types (Detailed)

3.1 Geohash Embeddings — Spatial Position Encoding

Purpose: Encode the aircraft's 3D geographic position as a discrete token.

Method: We use H3 hexagonal hierarchical spatial index (Uber's H3) at resolution 5 (hex area ≈ 252 km², edge ≈ 9.85 km) for en-route flight, with an option to use resolution 7 (≈ 5.16 km², edge ≈ 1.22 km) for terminal areas. This follows the H3-CLM paper's approach but adapted for aviation's larger spatial scale.

3D Extension: Since aircraft operate in 3D, we combine the H3 cell with an altitude band:

Geohash Token = H3_cell_index × N_alt_bands + alt_band_index

Altitude bands (1000 ft increments):
  Band 0:     0 - 1,000 ft    (ground / taxi)
  Band 1:  1,000 - 2,000 ft   (initial climb / approach)
  ...
  Band 45: 44,000 - 45,000 ft (high cruise)

  N_alt_bands = 46

Vocabulary size: At H3 resolution 5, the number of unique cells covering typical airspace is ~~100K-200K. With altitude bands: `~~200K × 46 ≈ 9.2M` — too large for direct embedding.

Solution — Factored Embedding:

E_geohash = E_h3[h3_cell_id] + E_alt[alt_band_id]

E_h3:  learned embedding table, vocab = N_h3_cells (~200K or hashing trick to 50K)
E_alt: learned embedding table, vocab = 46

Both project to d_model dimensions.

The hashing trick: Map H3 cell indices through a hash function to a fixed vocabulary of ~50,000 buckets. This bounds memory while maintaining spatial discrimination.

Why H3 over traditional geohash: H3 hexagons have uniform area (no polar distortion), hierarchical nesting, and consistent neighbor relationships — critical for trajectory continuity.

3.2 Temporal Embeddings — When Is the Aircraft Flying?

Purpose: Encode temporal context — time of day affects traffic density, routes, and behavior.

Method: Additive composition of multiple temporal scales:

E_temporal = E_hour[hour_of_day] + E_dow[day_of_week] + E_month[month]

E_hour:  24 entries  (captures rush hour vs. night patterns)
E_dow:    7 entries  (weekday vs. weekend traffic)
E_month: 12 entries  (seasonal routes, weather patterns)

All project to d_model dimensions.

Optional — Sinusoidal Sub-minute Encoding: For sub-minute resolution:

E_minute = sin(2π × minute / 60), cos(2π × minute / 60)  → linear → d_model

3.3 Uncertainty Embeddings — How Confident Are We?

Purpose: Encode the model's uncertainty about the current trajectory state. Aircraft in straight-and-level cruise have low uncertainty; aircraft maneuvering near airports have high uncertainty.

Method: Compute a trajectory smoothness score from recent states, then discretize:

Uncertainty sources (sliding window of k=5 recent states):

1. Position variance:  σ²_pos = var(Δlat) + var(Δlon)
2. Heading variance:   σ²_COG = circular_var(COG_{t-k:t})
3. Speed variance:     σ²_SOG = var(SOG_{t-k:t})
4. Altitude variance:  σ²_alt = var(alt_rate_{t-k:t})

Combined uncertainty score:
  U_t = w1·σ²_pos + w2·σ²_COG + w3·σ²_SOG + w4·σ²_alt

Discretize into N_uncert = 16 bins (quantile binning on training data)

E_uncertainty = E_uncert_table[bin(U_t)]   →  d_model

Weights w1-w4: Hyperparameters tuned on validation data, or learned as part of the model.

During inference: For multi-step prediction, uncertainty can be updated using MC-Dropout or ensemble disagreement.

3.4 Prompt Embeddings — Task and Context Metadata

Purpose: Provide metadata context about the flight, analogous to system prompts in LLMs. Enables task conditioning and multi-task learning.

Method: Learnable prompt tokens prepended to the trajectory:

Prompt token vocabulary:
  - Aircraft category:  [HEAVY, LARGE, SMALL, ROTORCRAFT, GLIDER, UAV, UNKNOWN]  (7)
  - Flight phase:       [CLIMB, CRUISE, DESCENT, APPROACH, GROUND, UNKNOWN]       (6)
  - Region:             [CONUS, EUROPE, ASIA, OTHER]                               (4)
  - Task:               [PREDICT, CLASSIFY, DETECT_ANOMALY]                        (3)
  - Special:            [BOS, EOS, PAD, MASK]                                      (4)

Total prompt vocab: ~24 tokens

Prompt sequence (prepended):
  [BOS, TASK_TOKEN, AIRCRAFT_TOKEN, PHASE_TOKEN, REGION_TOKEN]

Each has a learned embedding of dimension d_model.

For downstream classification: Change TASK_TOKEN to CLASSIFY; output at BOS position is used for classification.

4. Feature Derivation Pipeline

4.1 Raw Input

timestamp (Unix epoch seconds)
latitude  (degrees, WGS84)
longitude (degrees, WGS84)
altitude  (feet, barometric or geometric)

4.2 Derived Features

import numpy as np

def derive_features(timestamps, lats, lons, alts):
    """
    Derive COG, SOG, ROT, and altitude rate from raw position data.
    All inputs: numpy arrays of shape (N,) for a single trajectory.
    Returns arrays of shape (N,) — first element is NaN.
    """
    dt = np.diff(timestamps)  # seconds
    dt = np.maximum(dt, 1e-6)  # avoid division by zero
    
    # --- Course Over Ground (COG) ---
    lat1, lat2 = np.radians(lats[:-1]), np.radians(lats[1:])
    dlon = np.radians(np.diff(lons))
    
    x = np.sin(dlon) * np.cos(lat2)
    y = np.cos(lat1) * np.sin(lat2) - np.sin(lat1) * np.cos(lat2) * np.cos(dlon)
    COG = np.degrees(np.arctan2(x, y)) % 360  # [0, 360)
    
    # --- Speed Over Ground (SOG) ---
    dlat = np.radians(np.diff(lats))
    a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
    distance_nm = 3440.065 * c  # Earth radius in nautical miles
    SOG = distance_nm / (dt / 3600)  # knots
    
    # --- Rate of Turn (ROT) ---
    dCOG = np.diff(COG)
    dCOG = (dCOG + 180) % 360 - 180  # normalize to [-180, 180]
    ROT = np.full(len(lats), np.nan)
    ROT[2:] = dCOG / dt[1:]  # degrees per second
    
    # --- Rate of Altitude Change ---
    dalt = np.diff(alts)  # feet
    alt_rate = dalt / (dt / 60)  # feet per minute
    
    # Pad first elements
    COG_full = np.concatenate([[np.nan], COG])
    SOG_full = np.concatenate([[np.nan], SOG])
    alt_rate_full = np.concatenate([[np.nan], alt_rate])
    
    return COG_full, SOG_full, ROT, alt_rate_full

4.3 Feature Discretization

Feature	Range	Bin Width	N_bins	Notes
COG	[0, 360)	5°	72	Circular
SOG	[0, 600] kts	5 knots	121	Capped at ~Mach 1
ROT	[-6, 6] °/s	0.25 °/s	49	Capped ±6°/s
Altitude Rate	[-6000, 6000] fpm	200 ft/min	61	Capped ±6000 fpm

Outliers beyond caps clipped to boundary bin.

4.4 Trajectory Preprocessing Pipeline

1. Segment raw ADS-B by ICAO24 + temporal gaps > 15 min → individual flights
2. Resample to fixed Δt = 60 seconds (linear interp for position, circular for heading)
3. Derive features (COG, SOG, ROT, alt_rate)
4. Drop first 2 points per trajectory (NaN from derivation)
5. Filter: remove trajectories with < 20 points (< 20 minutes)
6. Compute H3 cell (res 5) + altitude band for each point
7. Discretize all continuous features into bins
8. Compute uncertainty scores (sliding window k=5)
9. Extract temporal features (hour, dow, month)
10. Construct prompt tokens from metadata (if available)

5. Model Hyperparameters

5.1 Model Dimensions

Parameter	Value	Rationale
d_model	256	H3-CLM found 256-1024 effective
n_heads	8	head_dim = 32
n_layers	8	Moderate depth for ~10M param model
d_ff	1024	4× d_model (standard)
max_seq_len	128	128 states × 60s ≈ 2 hours of flight
n_prompt_tokens	5	[BOS, TASK, AIRCRAFT, PHASE, REGION]
dropout	0.1

Total parameters: ~8-12M (trainable on single GPU in hours)

5.2 Vocabulary Sizes

Embedding	Vocab	Dim
H3 cells	50,000	256
Altitude bands	46	256
COG bins	72	256
SOG bins	121	256
ROT bins	49	256
Alt rate bins	61	256
Hour of day	24	256
Day of week	7	256
Month	12	256
Uncertainty bins	16	256
Prompt tokens	24	256

5.3 State Token Composition

Each timestep → single state token via additive fusion:

E_state_t = E_h3[h3_id_t] + E_alt_band[alt_band_t]            # Geohash (3D position)
          + E_COG[cog_bin_t] + E_SOG[sog_bin_t]                # Kinematics
          + E_ROT[rot_bin_t] + E_alt_rate[alt_rate_bin_t]       # Dynamics
          + E_hour[hour_t] + E_dow[dow_t] + E_month[month_t]   # Temporal
          + E_uncert[uncert_bin_t]                               # Uncertainty

E_state_t ∈ R^{d_model}

This additive fusion follows BERT (token + segment + position) and TrAISFormer.

6. Training Recipe

6.1 Pretraining: Next-State Prediction (Causal LM)

Objective: Given states 1..T, predict state at T+1 (applied autoregressively at every position).

Loss:

L = Σ_{t=1}^{T-1} [ λ_geo · CE(ŷ_geo_t, y_geo_{t+1})
                    + λ_COG · CE(ŷ_COG_t, y_COG_{t+1})
                    + λ_SOG · CE(ŷ_SOG_t, y_SOG_{t+1})
                    + λ_ROT · CE(ŷ_ROT_t, y_ROT_{t+1})
                    + λ_alt · CE(ŷ_alt_rate_t, y_alt_rate_{t+1})
                    + λ_altb · CE(ŷ_alt_band_t, y_alt_band_{t+1}) ]

λ values default to 1.0 (equal weighting).

Training hyperparameters (based on FTP-LLM + H3-CLM):

Parameter	Value
Optimizer	AdamW
Learning rate	5e-4
LR Schedule	Cosine + 5% warmup
Batch size (per GPU)	64
Gradient accumulation	4 (effective = 256)
Max epochs	30 (early stop p=5)
Weight decay	0.01
Gradient clipping	1.0
Mixed precision	bf16

Data windowing: Sliding window size=128, stride=64 (50% overlap).

6.2 Downstream: Activity Classification

After pretraining, attach classification head:

h_BOS → Linear(256, 128) → GELU → Dropout(0.1) → Linear(128, N_classes)

Fine-tuning options:

A: Freeze backbone, train head only (fast, small data)
B: Full fine-tune, backbone lr=1e-5, head lr=1e-3

7. Dataset Strategy

7.1 Prototyping — `traffic` Python Library

from traffic.data.samples import landing_zurich_2019
# ~2,000 flights near Zurich
# Columns: timestamp, icao24, callsign, latitude, longitude, altitude,
#          groundspeed, track, vertical_rate, ...

Instant access, clean, well-documented. Single airport, limited diversity.

7.2 Training — OpenSky Network

from pyopensky.trino import Trino
trino = Trino()
df = trino.rawquery("""
    SELECT time, icao24, lat, lon, baroaltitude, velocity, heading, vertrate
    FROM state_vectors_data4
    WHERE hour >= '2024-01-15 00:00:00'
      AND hour <  '2024-01-15 12:00:00'
      AND lat BETWEEN 40 AND 55
      AND lon BETWEEN -10 AND 20
    ORDER BY icao24, time
""")

Target:

Region A (train): Europe, 1 month → ~500K-1M flights
Region B (OOD test): US CONUS, 1 week → ~200K flights
Region C (far test): East Asia, 1 week → ~100K flights

7.3 Alternative: SCAT Dataset

~170K en-route flights over Sweden, Zenodo. Pre-segmented, clean.

7.4 Data Split

Training:    70% of Region A flights
Validation:  15% of Region A flights  
Test (IID):  15% of Region A flights
Test (OOD):  100% of Region B flights
Test (Far):  100% of Region C flights

Split by flight (not time window) to avoid data leakage.

8. Ablation Study: Geohash Geographic Dependency

8.1 Hypothesis

Geohash embeddings encode absolute geographic position, causing the model to memorize region-specific patterns (airways, approach paths, airspace structure). This improves in-distribution performance but degrades transfer to unseen regions.

8.2 Experimental Variants

Variant	Geohash Type	Description
V1: Full Model	H3 absolute	Complete architecture as described
V2: No Geohash	None	Remove geohash entirely; model sees only kinematics + temporal + uncertainty
V3: Relative Geohash	H3 relative	H3 cell of (Δlat, Δlon) from trajectory start — position-invariant
V4: Multi-Resolution	H3 res 3+5+7	3 resolutions summed (coarse→fine)
V5: Continuous Position	Linear projection	`Linear([lat, lon, alt] → d_model)` — no discretization

8.3 Evaluation Metrics

For each variant × each test set (IID, OOD, Far):

Metric	Description
Geo Accuracy	% correct H3 cell prediction
Position MAE	Mean absolute error in km
COG MAE	Heading error in degrees
SOG MAE	Speed error in knots
Multi-step ADE	Average displacement error over 5 predicted steps
Multi-step FDE	Final displacement error at step 5

8.4 Key Comparisons

Comparison	Tests
V1 vs V2 (IID)	How much geohash helps when test = train region
V1 vs V2 (OOD)	If V2 > V1 on OOD → geohash causes geographic overfitting
V1 vs V3 (OOD)	If V3 good on both IID and OOD → relative geohash is the sweet spot
V4 (all)	Multi-resolution: coarse cells transfer, fine cells specialize?
V5 (all)	Does continuous encoding avoid discretization issues?

8.5 Expected Outcomes

V1: Best IID, worst OOD (hypothesis)
V3: Best compromise — predicted winner
V5: May struggle (loses discrete token structure transformers excel at)
V2: Strong OOD baseline, sacrifices IID

8.6 Additional Analysis

Attention visualization: V1 vs V3 attention patterns
Embedding clustering: t-SNE of geohash embeddings colored by region
Learning curves: IID vs OOD performance vs training data size

9. Implementation Phases

Phase 1: Data Pipeline (Week 1)

Set up traffic library, extract sample trajectories
Implement feature derivation (COG, SOG, ROT, alt_rate)
Implement H3 geohash encoding + altitude banding
Implement feature discretization (binning)
Implement uncertainty score computation
Build PyTorch Dataset class with sliding window
Unit tests for all derivation functions

Phase 2: Model Architecture (Week 1-2)

Implement all embedding tables
Implement additive fusion layer
Implement prompt token prepending
Implement decoder-only transformer backbone
Implement multi-head output (6 prediction heads)
Implement classification head (for downstream)
Forward pass test with dummy data

Phase 3: Pretraining (Week 2-3)

Implement training loop with multi-task loss
Prototyping run on traffic data (small, fast iteration)
Scale to OpenSky data
Monitor loss curves, validate convergence
Save best checkpoint

Phase 4: Downstream Adaptation (Week 3-4)

Implement classification fine-tuning pipeline
Test on activity classification task
Compare frozen vs. fine-tuned backbone

Phase 5: Ablation Study (Week 4-5)

Implement all 5 geohash variants
Train each variant with identical hyperparameters
Evaluate on IID, OOD, and Far test sets
Generate comparison tables and visualizations
Write analysis of geographic dependency findings

10. Key Design Decisions & Rationale

Decision	Choice	Why
Custom model vs. pretrained LLM	Custom ~10M param transformer	FTP-LLM showed text-tokenized LLMs work, but custom allows proper multi-feature fusion. 10M params trains in hours.
H3 vs. traditional geohash	H3	Uniform hexagonal cells, no polar distortion, hierarchical. Proven by H3-CLM.
Additive vs. concatenative fusion	Additive	BERT/TrAISFormer paradigm. Keeps d_model constant. Concatenation → d_model × N_features = massive.
60s time resolution	60 seconds	FTP-LLM validated 1-min aggregation. 128 steps ≈ 2+ hours.
Factored geohash (H3 + alt)	Separate tables, summed	Avoids combinatorial explosion (9.2M → 50K + 46).
Multi-head output	Separate softmax per feature	More interpretable, allows per-feature analysis.
Uncertainty from smoothness	Variance-based	Computable at data time, no inference overhead.

11. Risk Analysis

Risk	Likelihood	Impact	Mitigation
Geohash overfits to region	High	High	Ablation study; V3 (relative) is fallback
OpenSky access issues	Medium	High	Fallback: `traffic` samples + SCAT
60s too coarse for terminal	Medium	Low	Separate terminal model at 10s
Model too small	Low	Medium	Scale: d_model→512, n_layers→16 (~40M)
Alt discretization too coarse	Low	Low	Refine to 500ft bands (92)

12. Monitoring & Evaluation

During training (Trackio):

Total loss + per-feature loss curves
Validation loss each epoch
LR schedule, GPU utilization

After training:

Next-state accuracy (top-1, top-5 per feature)
Position error in km
Multi-step prediction (1, 5, 10, 20 steps ahead)
Downstream classification F1/precision/recall

Grounded in: FTP-LLM, H3-CLM, GeoFormer, TrAISFormer, and LLM4STP (reconstructed). Ready for implementation upon approval.