| # Phase 12 β TerraMind TiM (Thinking-in-Modalities) on NYC LULC |
|
|
| ## Goal |
|
|
| Replicate IBM-ESA's headline TerraMind innovation β TiM (Thinking-in-Modalities) β |
| on our NYC LULC task. The hypothesis from the TerraMind paper (Jakubik et al., |
| arXiv:2504.11171) is that generating intermediate modality tokens (e.g., |
| synthetic LULC) BEFORE predicting downstream improves accuracy by 2β5 pp. |
|
|
| This is the *paper-grade differentiator* for the hackathon submission. To my |
| knowledge nobody has publicly reproduced TiM on NYC. |
|
|
| ## Status |
|
|
| Scaffold + recipe research. Awaits GPU window. |
|
|
| ## Recipe (from TerraMind GitHub examples) |
|
|
| The reference is `terramind_v1_small_sen1floods11.ipynb` in IBM's terramind |
| repo, which shows TiM with `tim_modalities: [LULC]` for binary water seg. |
|
|
| Adaptation for our 5-class NYC LULC: |
|
|
| ```yaml |
| # delta from training/terramind_v1_base_nyc_phase2.yaml |
| model: |
| init_args: |
| model_args: |
| backbone: terramind_v1_base_tim # vs terramind_v1_base |
| tim_modalities: [LULC] # generate synthetic LULC tokens first |
| backbone_modalities: [S2L2A, S1RTC, DEM] # actual inputs |
| backbone_use_temporal: true |
| backbone_temporal_n_timestamps: 4 |
| # rest unchanged from Phase 2 |
| ``` |
|
|
| The TiM model generates synthetic LULC tokens from the input modalities, |
| then uses those tokens AS ADDITIONAL CONTEXT for the downstream LULC head. |
| Self-referential β the model "thinks in LULC" before predicting LULC. |
|
|
| For our 5-class NYC LULC where the GROUND TRUTH IS ALSO LULC, this is a |
| slightly pathological case. The cleaner TiM ablation would use a different |
| intermediate modality (NDVI from S2 β LULC, or LULC from S1 alone). Worth |
| testing both. |
|
|
| ## Plan |
|
|
| 1. Scaffold (this file) β done. |
| 2. Write `tim_smoke.py` β tiny smoke run to confirm TiM model loads and |
| trains on our NYC dataset without architectural changes. |
| 3. Write `phase3_tim.yaml` β the TiM-enabled training config. |
| 4. Run the fine-tune (~6 GPU-hr). |
| 5. Eval against Phase-2 (no-TiM) on the same 64-chip held-out test split. |
| Same metrics: per-class IoU, overall mIoU, Pixel_Accuracy, F1. |
| 6. Publish as `msradam/TerraMind-base-NYC-TiM-LULC` if it beats Phase 2 by |
| at least 1pp on test mIoU. |
| |
| ## Eval gate |
| |
| Strong: > +2pp mIoU vs Phase 2 β publish, headline result |
| Acceptable: 0 to +2pp β publish, "TiM stable on NYC" framing |
| Negative: < 0 mIoU vs Phase 2 β publish negative result, document framing |
| |
| ## Risk |
| |
| Medium. TiM recipe needs adaptation from sen1floods11's setup; 1-2 hours |
| of debug time likely. Backup plan if TiM model variant doesn't load: |
| implement TiM-as-input-augmentation manually (run base TerraMind in |
| generate mode for synthetic LULC, concatenate to input for fine-tune). |
| |
| ## Reproduction (planned) |
| |
| ```bash |
| docker exec terramind bash -c " |
| terratorch fit --config /root/config_phase3_tim.yaml |
| terratorch test --config /root/config_phase3_tim.yaml --ckpt_path .../best_val_loss.ckpt |
| " |
| ``` |
| |