# AirTrackLM: LLM4STP Adapted for ADS-B Air Track Prediction ## Complete Architecture & Implementation Plan --- ## 1. Executive Summary We adapt the LLM4STP multi-feature fusion architecture (originally for maritime AIS ship trajectory prediction) to work with **ADS-B air track data**. The model uses a **decoder-only transformer** with four specialized embedding types — Prompt, Uncertainty, Geohash, and Temporal — fused together for **next-state prediction** pretraining. Once pretrained, the model is adaptable to downstream tasks like activity classification. This design is grounded in published results from: - **FTP-LLM** (arXiv:2501.17459) — LLaMA-3.1-8B for flight trajectory prediction - **H3-CLM** (arXiv:2405.09596) — H3 geohash + causal LM for maritime trajectories - **GeoFormer** (arXiv:2311.05092) — GPT-style geospatial tokenization - **TrAISFormer** (arXiv:2109.03958) — Discrete tokenization of AIS features --- ## 2. System Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────────┐ │ RAW ADS-B INPUT │ │ (timestamp, latitude, longitude, altitude) │ └─────────────────────────┬───────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ FEATURE DERIVATION PIPELINE │ │ │ │ Raw: lat, lon, alt │ │ Derived: COG, SOG, ROT, altitude_rate │ │ Meta: timestamp → (hour, day_of_week, month) │ │ │ │ Output per timestep: │ │ state_t = [lat, lon, alt, COG, SOG, ROT, alt_rate] │ └─────────────────────────┬───────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ TOKENIZATION / ENCODING │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Geohash │ │ Continuous │ │ Temporal │ │ │ │ Tokenizer │ │ Discretizer │ │ Encoder │ │ │ │ │ │ │ │ │ │ │ │ lat,lon,alt │ │ COG,SOG,ROT │ │ hour,dow, │ │ │ │ → H3 cell + │ │ alt_rate │ │ month │ │ │ │ alt_band │ │ → bin IDs │ │ → time IDs │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Geohash │ │ Feature │ │ Temporal │ │ │ │ Embedding │ │ Embeddings │ │ Embedding │ │ │ │ Table │ │ Tables │ │ Table │ │ │ │ (d_model) │ │ (d_model) │ │ (d_model) │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ └──────────┼─────────────────┼─────────────────┼──────────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ EMBEDDING FUSION LAYER │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────────┐ │ │ │ Geohash │ │ Feature │ │ Temporal │ │ Uncertainty │ │ │ │ Embed │ │ Embed │ │ Embed │ │ Embed │ │ │ │ (d_model) │ │ (d_model) │ │ (d_model) │ │ (d_model) │ │ │ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ └──────┬───────┘ │ │ │ │ │ │ │ │ └──────────┬───┴──────┬───────┘ │ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ E_state = E_geo + E_feat + E_temp + E_uncert │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────┐ │ │ │ Prompt Embedding (prepended prefix) │ │ │ │ [PROMPT_1, PROMPT_2, ..., PROMPT_k] │ │ │ └───────────────────┬───────────────────────┘ │ │ │ │ │ ▼ │ │ Input: [PROMPT_TOKENS | STATE_1 | STATE_2 | ... | STATE_T] │ │ │ │ │ ▼ │ │ Linear Projection → d_model │ │ │ │ │ ▼ │ │ + Positional Encoding (sinusoidal) │ │ │ └───────────────────────┬─────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ DECODER-ONLY TRANSFORMER BACKBONE │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Transformer Block ×N_layers │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ │ │ Causal Multi-Head Self-Attention │ │ │ │ │ │ (masked: each position attends only │ │ │ │ │ │ to itself and earlier positions) │ │ │ │ │ └──────────────────┬──────────────────────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ │ │ LayerNorm + Residual Connection │ │ │ │ │ └──────────────────┬──────────────────────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ │ │ Feed-Forward Network │ │ │ │ │ │ (Linear → GELU → Linear) │ │ │ │ │ │ d_model → 4*d_model → d_model │ │ │ │ │ └──────────────────┬──────────────────────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ │ │ LayerNorm + Residual Connection │ │ │ │ │ └─────────────────────────────────────────┘ │ │ │ │ │ │ │ └─────────────────────────────────────────────┘ │ │ │ │ └───────────────────────┬─────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ OUTPUT HEADS │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ PRETRAINING: Next-State Prediction Head │ │ │ │ │ │ │ │ For each position t, predict state at t+1: │ │ │ │ │ │ │ │ h_t → Linear → softmax → P(geohash_token_{t+1}) │ │ │ │ h_t → Linear → softmax → P(COG_bin_{t+1}) │ │ │ │ h_t → Linear → softmax → P(SOG_bin_{t+1}) │ │ │ │ h_t → Linear → softmax → P(ROT_bin_{t+1}) │ │ │ │ h_t → Linear → softmax → P(alt_rate_bin_{t+1}) │ │ │ │ h_t → Linear → softmax → P(alt_band_{t+1}) │ │ │ │ │ │ │ │ Loss = Σ CrossEntropy(predicted_feature, true_feature) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ DOWNSTREAM: Activity Classification Head │ │ │ │ (attached after pretraining, frozen or fine-tuned) │ │ │ │ │ │ │ │ h_[BOS] or mean(h_1:T) → MLP → softmax → class label │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘ ``` --- ## 3. The Four Embedding Types (Detailed) ### 3.1 Geohash Embeddings — Spatial Position Encoding **Purpose**: Encode the aircraft's 3D geographic position as a discrete token. **Method**: We use **H3 hexagonal hierarchical spatial index** (Uber's H3) at resolution 5 (hex area ≈ 252 km², edge ≈ 9.85 km) for en-route flight, with an option to use resolution 7 (≈ 5.16 km², edge ≈ 1.22 km) for terminal areas. This follows the H3-CLM paper's approach but adapted for aviation's larger spatial scale. **3D Extension**: Since aircraft operate in 3D, we combine the H3 cell with an **altitude band**: ``` Geohash Token = H3_cell_index × N_alt_bands + alt_band_index Altitude bands (1000 ft increments): Band 0: 0 - 1,000 ft (ground / taxi) Band 1: 1,000 - 2,000 ft (initial climb / approach) ... Band 45: 44,000 - 45,000 ft (high cruise) N_alt_bands = 46 ``` **Vocabulary size**: At H3 resolution 5, the number of unique cells covering typical airspace is ~100K-200K. With altitude bands: `~200K × 46 ≈ 9.2M` — too large for direct embedding. **Solution — Factored Embedding**: ``` E_geohash = E_h3[h3_cell_id] + E_alt[alt_band_id] E_h3: learned embedding table, vocab = N_h3_cells (~200K or hashing trick to 50K) E_alt: learned embedding table, vocab = 46 Both project to d_model dimensions. ``` The **hashing trick**: Map H3 cell indices through a hash function to a fixed vocabulary of ~50,000 buckets. This bounds memory while maintaining spatial discrimination. **Why H3 over traditional geohash**: H3 hexagons have uniform area (no polar distortion), hierarchical nesting, and consistent neighbor relationships — critical for trajectory continuity. ### 3.2 Temporal Embeddings — When Is the Aircraft Flying? **Purpose**: Encode temporal context — time of day affects traffic density, routes, and behavior. **Method**: Additive composition of multiple temporal scales: ``` E_temporal = E_hour[hour_of_day] + E_dow[day_of_week] + E_month[month] E_hour: 24 entries (captures rush hour vs. night patterns) E_dow: 7 entries (weekday vs. weekend traffic) E_month: 12 entries (seasonal routes, weather patterns) All project to d_model dimensions. ``` **Optional — Sinusoidal Sub-minute Encoding**: For sub-minute resolution: ``` E_minute = sin(2π × minute / 60), cos(2π × minute / 60) → linear → d_model ``` ### 3.3 Uncertainty Embeddings — How Confident Are We? **Purpose**: Encode the model's uncertainty about the current trajectory state. Aircraft in straight-and-level cruise have low uncertainty; aircraft maneuvering near airports have high uncertainty. **Method**: Compute a **trajectory smoothness score** from recent states, then discretize: ``` Uncertainty sources (sliding window of k=5 recent states): 1. Position variance: σ²_pos = var(Δlat) + var(Δlon) 2. Heading variance: σ²_COG = circular_var(COG_{t-k:t}) 3. Speed variance: σ²_SOG = var(SOG_{t-k:t}) 4. Altitude variance: σ²_alt = var(alt_rate_{t-k:t}) Combined uncertainty score: U_t = w1·σ²_pos + w2·σ²_COG + w3·σ²_SOG + w4·σ²_alt Discretize into N_uncert = 16 bins (quantile binning on training data) E_uncertainty = E_uncert_table[bin(U_t)] → d_model ``` **Weights w1-w4**: Hyperparameters tuned on validation data, or learned as part of the model. **During inference**: For multi-step prediction, uncertainty can be updated using MC-Dropout or ensemble disagreement. ### 3.4 Prompt Embeddings — Task and Context Metadata **Purpose**: Provide metadata context about the flight, analogous to system prompts in LLMs. Enables task conditioning and multi-task learning. **Method**: Learnable prompt tokens prepended to the trajectory: ``` Prompt token vocabulary: - Aircraft category: [HEAVY, LARGE, SMALL, ROTORCRAFT, GLIDER, UAV, UNKNOWN] (7) - Flight phase: [CLIMB, CRUISE, DESCENT, APPROACH, GROUND, UNKNOWN] (6) - Region: [CONUS, EUROPE, ASIA, OTHER] (4) - Task: [PREDICT, CLASSIFY, DETECT_ANOMALY] (3) - Special: [BOS, EOS, PAD, MASK] (4) Total prompt vocab: ~24 tokens Prompt sequence (prepended): [BOS, TASK_TOKEN, AIRCRAFT_TOKEN, PHASE_TOKEN, REGION_TOKEN] Each has a learned embedding of dimension d_model. ``` **For downstream classification**: Change TASK_TOKEN to CLASSIFY; output at BOS position is used for classification. --- ## 4. Feature Derivation Pipeline ### 4.1 Raw Input ``` timestamp (Unix epoch seconds) latitude (degrees, WGS84) longitude (degrees, WGS84) altitude (feet, barometric or geometric) ``` ### 4.2 Derived Features ```python import numpy as np def derive_features(timestamps, lats, lons, alts): """ Derive COG, SOG, ROT, and altitude rate from raw position data. All inputs: numpy arrays of shape (N,) for a single trajectory. Returns arrays of shape (N,) — first element is NaN. """ dt = np.diff(timestamps) # seconds dt = np.maximum(dt, 1e-6) # avoid division by zero # --- Course Over Ground (COG) --- lat1, lat2 = np.radians(lats[:-1]), np.radians(lats[1:]) dlon = np.radians(np.diff(lons)) x = np.sin(dlon) * np.cos(lat2) y = np.cos(lat1) * np.sin(lat2) - np.sin(lat1) * np.cos(lat2) * np.cos(dlon) COG = np.degrees(np.arctan2(x, y)) % 360 # [0, 360) # --- Speed Over Ground (SOG) --- dlat = np.radians(np.diff(lats)) a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2 c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a)) distance_nm = 3440.065 * c # Earth radius in nautical miles SOG = distance_nm / (dt / 3600) # knots # --- Rate of Turn (ROT) --- dCOG = np.diff(COG) dCOG = (dCOG + 180) % 360 - 180 # normalize to [-180, 180] ROT = np.full(len(lats), np.nan) ROT[2:] = dCOG / dt[1:] # degrees per second # --- Rate of Altitude Change --- dalt = np.diff(alts) # feet alt_rate = dalt / (dt / 60) # feet per minute # Pad first elements COG_full = np.concatenate([[np.nan], COG]) SOG_full = np.concatenate([[np.nan], SOG]) alt_rate_full = np.concatenate([[np.nan], alt_rate]) return COG_full, SOG_full, ROT, alt_rate_full ``` ### 4.3 Feature Discretization | Feature | Range | Bin Width | N_bins | Notes | |---------------|-------------------|--------------|--------|--------------------| | COG | [0, 360) | 5° | 72 | Circular | | SOG | [0, 600] kts | 5 knots | 121 | Capped at ~Mach 1 | | ROT | [-6, 6] °/s | 0.25 °/s | 49 | Capped ±6°/s | | Altitude Rate | [-6000, 6000] fpm | 200 ft/min | 61 | Capped ±6000 fpm | Outliers beyond caps clipped to boundary bin. ### 4.4 Trajectory Preprocessing Pipeline ``` 1. Segment raw ADS-B by ICAO24 + temporal gaps > 15 min → individual flights 2. Resample to fixed Δt = 60 seconds (linear interp for position, circular for heading) 3. Derive features (COG, SOG, ROT, alt_rate) 4. Drop first 2 points per trajectory (NaN from derivation) 5. Filter: remove trajectories with < 20 points (< 20 minutes) 6. Compute H3 cell (res 5) + altitude band for each point 7. Discretize all continuous features into bins 8. Compute uncertainty scores (sliding window k=5) 9. Extract temporal features (hour, dow, month) 10. Construct prompt tokens from metadata (if available) ``` --- ## 5. Model Hyperparameters ### 5.1 Model Dimensions | Parameter | Value | Rationale | |------------------|--------|----------------------------------------------------| | d_model | 256 | H3-CLM found 256-1024 effective | | n_heads | 8 | head_dim = 32 | | n_layers | 8 | Moderate depth for ~10M param model | | d_ff | 1024 | 4× d_model (standard) | | max_seq_len | 128 | 128 states × 60s ≈ 2 hours of flight | | n_prompt_tokens | 5 | [BOS, TASK, AIRCRAFT, PHASE, REGION] | | dropout | 0.1 | | **Total parameters**: ~8-12M (trainable on single GPU in hours) ### 5.2 Vocabulary Sizes | Embedding | Vocab | Dim | |------------------|--------|-----| | H3 cells | 50,000 | 256 | | Altitude bands | 46 | 256 | | COG bins | 72 | 256 | | SOG bins | 121 | 256 | | ROT bins | 49 | 256 | | Alt rate bins | 61 | 256 | | Hour of day | 24 | 256 | | Day of week | 7 | 256 | | Month | 12 | 256 | | Uncertainty bins | 16 | 256 | | Prompt tokens | 24 | 256 | ### 5.3 State Token Composition Each timestep → single state token via additive fusion: ``` E_state_t = E_h3[h3_id_t] + E_alt_band[alt_band_t] # Geohash (3D position) + E_COG[cog_bin_t] + E_SOG[sog_bin_t] # Kinematics + E_ROT[rot_bin_t] + E_alt_rate[alt_rate_bin_t] # Dynamics + E_hour[hour_t] + E_dow[dow_t] + E_month[month_t] # Temporal + E_uncert[uncert_bin_t] # Uncertainty E_state_t ∈ R^{d_model} ``` This additive fusion follows BERT (token + segment + position) and TrAISFormer. --- ## 6. Training Recipe ### 6.1 Pretraining: Next-State Prediction (Causal LM) **Objective**: Given states 1..T, predict state at T+1 (applied autoregressively at every position). **Loss**: ``` L = Σ_{t=1}^{T-1} [ λ_geo · CE(ŷ_geo_t, y_geo_{t+1}) + λ_COG · CE(ŷ_COG_t, y_COG_{t+1}) + λ_SOG · CE(ŷ_SOG_t, y_SOG_{t+1}) + λ_ROT · CE(ŷ_ROT_t, y_ROT_{t+1}) + λ_alt · CE(ŷ_alt_rate_t, y_alt_rate_{t+1}) + λ_altb · CE(ŷ_alt_band_t, y_alt_band_{t+1}) ] λ values default to 1.0 (equal weighting). ``` **Training hyperparameters** (based on FTP-LLM + H3-CLM): | Parameter | Value | |----------------------|---------------------| | Optimizer | AdamW | | Learning rate | 5e-4 | | LR Schedule | Cosine + 5% warmup | | Batch size (per GPU) | 64 | | Gradient accumulation| 4 (effective = 256) | | Max epochs | 30 (early stop p=5) | | Weight decay | 0.01 | | Gradient clipping | 1.0 | | Mixed precision | bf16 | **Data windowing**: Sliding window size=128, stride=64 (50% overlap). ### 6.2 Downstream: Activity Classification After pretraining, attach classification head: ``` h_BOS → Linear(256, 128) → GELU → Dropout(0.1) → Linear(128, N_classes) ``` **Fine-tuning options**: - **A**: Freeze backbone, train head only (fast, small data) - **B**: Full fine-tune, backbone lr=1e-5, head lr=1e-3 --- ## 7. Dataset Strategy ### 7.1 Prototyping — `traffic` Python Library ```python from traffic.data.samples import landing_zurich_2019 # ~2,000 flights near Zurich # Columns: timestamp, icao24, callsign, latitude, longitude, altitude, # groundspeed, track, vertical_rate, ... ``` Instant access, clean, well-documented. Single airport, limited diversity. ### 7.2 Training — OpenSky Network ```python from pyopensky.trino import Trino trino = Trino() df = trino.rawquery(""" SELECT time, icao24, lat, lon, baroaltitude, velocity, heading, vertrate FROM state_vectors_data4 WHERE hour >= '2024-01-15 00:00:00' AND hour < '2024-01-15 12:00:00' AND lat BETWEEN 40 AND 55 AND lon BETWEEN -10 AND 20 ORDER BY icao24, time """) ``` **Target**: - **Region A** (train): Europe, 1 month → ~500K-1M flights - **Region B** (OOD test): US CONUS, 1 week → ~200K flights - **Region C** (far test): East Asia, 1 week → ~100K flights ### 7.3 Alternative: SCAT Dataset ~170K en-route flights over Sweden, Zenodo. Pre-segmented, clean. ### 7.4 Data Split ``` Training: 70% of Region A flights Validation: 15% of Region A flights Test (IID): 15% of Region A flights Test (OOD): 100% of Region B flights Test (Far): 100% of Region C flights ``` Split by **flight** (not time window) to avoid data leakage. --- ## 8. Ablation Study: Geohash Geographic Dependency ### 8.1 Hypothesis > Geohash embeddings encode **absolute geographic position**, causing the model to memorize region-specific patterns (airways, approach paths, airspace structure). This improves in-distribution performance but degrades transfer to unseen regions. ### 8.2 Experimental Variants | Variant | Geohash Type | Description | |---------|-------------|-------------| | **V1: Full Model** | H3 absolute | Complete architecture as described | | **V2: No Geohash** | None | Remove geohash entirely; model sees only kinematics + temporal + uncertainty | | **V3: Relative Geohash** | H3 relative | H3 cell of (Δlat, Δlon) from trajectory start — position-invariant | | **V4: Multi-Resolution** | H3 res 3+5+7 | 3 resolutions summed (coarse→fine) | | **V5: Continuous Position** | Linear projection | `Linear([lat, lon, alt] → d_model)` — no discretization | ### 8.3 Evaluation Metrics For each variant × each test set (IID, OOD, Far): | Metric | Description | |--------|-------------| | Geo Accuracy | % correct H3 cell prediction | | Position MAE | Mean absolute error in km | | COG MAE | Heading error in degrees | | SOG MAE | Speed error in knots | | Multi-step ADE | Average displacement error over 5 predicted steps | | Multi-step FDE | Final displacement error at step 5 | ### 8.4 Key Comparisons | Comparison | Tests | |-----------|-------| | V1 vs V2 (IID) | How much geohash helps when test = train region | | V1 vs V2 (OOD) | If V2 > V1 on OOD → geohash causes geographic overfitting | | V1 vs V3 (OOD) | If V3 good on both IID and OOD → relative geohash is the sweet spot | | V4 (all) | Multi-resolution: coarse cells transfer, fine cells specialize? | | V5 (all) | Does continuous encoding avoid discretization issues? | ### 8.5 Expected Outcomes - **V1**: Best IID, worst OOD (hypothesis) - **V3**: Best compromise — predicted winner - **V5**: May struggle (loses discrete token structure transformers excel at) - **V2**: Strong OOD baseline, sacrifices IID ### 8.6 Additional Analysis - **Attention visualization**: V1 vs V3 attention patterns - **Embedding clustering**: t-SNE of geohash embeddings colored by region - **Learning curves**: IID vs OOD performance vs training data size --- ## 9. Implementation Phases ### Phase 1: Data Pipeline (Week 1) - Set up `traffic` library, extract sample trajectories - Implement feature derivation (COG, SOG, ROT, alt_rate) - Implement H3 geohash encoding + altitude banding - Implement feature discretization (binning) - Implement uncertainty score computation - Build PyTorch Dataset class with sliding window - Unit tests for all derivation functions ### Phase 2: Model Architecture (Week 1-2) - Implement all embedding tables - Implement additive fusion layer - Implement prompt token prepending - Implement decoder-only transformer backbone - Implement multi-head output (6 prediction heads) - Implement classification head (for downstream) - Forward pass test with dummy data ### Phase 3: Pretraining (Week 2-3) - Implement training loop with multi-task loss - Prototyping run on `traffic` data (small, fast iteration) - Scale to OpenSky data - Monitor loss curves, validate convergence - Save best checkpoint ### Phase 4: Downstream Adaptation (Week 3-4) - Implement classification fine-tuning pipeline - Test on activity classification task - Compare frozen vs. fine-tuned backbone ### Phase 5: Ablation Study (Week 4-5) - Implement all 5 geohash variants - Train each variant with identical hyperparameters - Evaluate on IID, OOD, and Far test sets - Generate comparison tables and visualizations - Write analysis of geographic dependency findings --- ## 10. Key Design Decisions & Rationale | Decision | Choice | Why | |----------|--------|-----| | Custom model vs. pretrained LLM | Custom ~10M param transformer | FTP-LLM showed text-tokenized LLMs work, but custom allows proper multi-feature fusion. 10M params trains in hours. | | H3 vs. traditional geohash | H3 | Uniform hexagonal cells, no polar distortion, hierarchical. Proven by H3-CLM. | | Additive vs. concatenative fusion | Additive | BERT/TrAISFormer paradigm. Keeps d_model constant. Concatenation → d_model × N_features = massive. | | 60s time resolution | 60 seconds | FTP-LLM validated 1-min aggregation. 128 steps ≈ 2+ hours. | | Factored geohash (H3 + alt) | Separate tables, summed | Avoids combinatorial explosion (9.2M → 50K + 46). | | Multi-head output | Separate softmax per feature | More interpretable, allows per-feature analysis. | | Uncertainty from smoothness | Variance-based | Computable at data time, no inference overhead. | --- ## 11. Risk Analysis | Risk | Likelihood | Impact | Mitigation | |------|-----------|--------|------------| | Geohash overfits to region | High | High | Ablation study; V3 (relative) is fallback | | OpenSky access issues | Medium | High | Fallback: `traffic` samples + SCAT | | 60s too coarse for terminal | Medium | Low | Separate terminal model at 10s | | Model too small | Low | Medium | Scale: d_model→512, n_layers→16 (~40M) | | Alt discretization too coarse | Low | Low | Refine to 500ft bands (92) | --- ## 12. Monitoring & Evaluation **During training** (Trackio): - Total loss + per-feature loss curves - Validation loss each epoch - LR schedule, GPU utilization **After training**: - Next-state accuracy (top-1, top-5 per feature) - Position error in km - Multi-step prediction (1, 5, 10, 20 steps ahead) - Downstream classification F1/precision/recall --- *Grounded in: FTP-LLM, H3-CLM, GeoFormer, TrAISFormer, and LLM4STP (reconstructed). Ready for implementation upon approval.*