File size: 2,424 Bytes
48b8bfe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# AirTrackLM

A decoder-only transformer for ADS-B air track next-state prediction, adapted from the LLM4STP architecture.

## Architecture

- **Model**: Custom ~7M parameter decoder-only transformer
- **4 Embedding Types**: Geohash (40-bit binary, 3D), Kinematic Features (COG/SOG/ROT/AltRate), Temporal (sub-second sinusoidal), Uncertainty (4 methods + learned heteroscedastic)
- **Pretraining**: Next-state prediction (predict all features at t+1 from sequence up to t)
- **Coordinate System**: ENU (East-North-Up) with 3-point central derivative for velocity computation

## Uncertainty Methods

1. **Kinematic Variance** β€” Sliding-window variance of COG/SOG/ROT/alt_rate
2. **Prediction Residual** β€” Deviation from constant-velocity prediction model
3. **Spatial Density** β€” Data coverage proxy (fewer nearby training points = higher uncertainty)
4. **Flight Phase Entropy** β€” Entropy of phase classification in a window (mixed phases = uncertain)
5. **Learned Heteroscedastic** β€” Model predicts its own log-variance per output head (aleatoric)
6. **MC-Dropout** β€” Monte Carlo dropout at inference for epistemic uncertainty

## Features

- **Inputs**: Raw ADS-B (lat, lon, alt, timestamp)
- **Derived**: COG, SOG, ROT, altitude rate via 3-point central derivative on ENU positions
- **Geohash**: 40-bit binary encoding per axis (E, N, U) = 120-bit 3D position token
- **Temporal**: Sinusoidal second-of-day (sub-second resolution) + calendar embeddings + Ξ”t encoding
- **Output Heads**: Binary geohash prediction, continuous Ξ”-ENU regression, COG/SOG/ROT/AltRate bin classification

## Data

Training data from the `traffic` Python library (real ADS-B surveillance data).

## Files

- `model.py` β€” Full model architecture (AirTrackLM, embeddings, loss functions)
- `data_pipeline.py` β€” ENU conversion, 3-point derivatives, geohash encoding, dataset
- `uncertainty.py` β€” 6 uncertainty quantification methods
- `train.py` β€” Training utilities
- `train_full.py` β€” Full GPU training script with Hub push
- `ARCHITECTURE.md` β€” Detailed architecture document

## Based On

- **LLM4STP** (Joker-hang/LLM4STP) β€” Binary geohash encoding, GPT-2 backbone concept
- **FTP-LLM** (arXiv:2501.17459) β€” LLM for flight trajectory prediction
- **H3-CLM** (arXiv:2405.09596) β€” Hexagonal geohash + causal LM for maritime trajectories
- **GeoFormer** (arXiv:2311.05092) β€” GPT-style geospatial tokenization