seriffic's picture
Backend evolution: Phases 1-10 specialists + agentic FSM + Mellea + LiteLLM router
6a82282

Phase 12 β€” TerraMind TiM (Thinking-in-Modalities) on NYC LULC

Goal

Replicate IBM-ESA's headline TerraMind innovation β€” TiM (Thinking-in-Modalities) β€” on our NYC LULC task. The hypothesis from the TerraMind paper (Jakubik et al., arXiv:2504.11171) is that generating intermediate modality tokens (e.g., synthetic LULC) BEFORE predicting downstream improves accuracy by 2–5 pp.

This is the paper-grade differentiator for the hackathon submission. To my knowledge nobody has publicly reproduced TiM on NYC.

Status

Scaffold + recipe research. Awaits GPU window.

Recipe (from TerraMind GitHub examples)

The reference is terramind_v1_small_sen1floods11.ipynb in IBM's terramind repo, which shows TiM with tim_modalities: [LULC] for binary water seg.

Adaptation for our 5-class NYC LULC:

# delta from training/terramind_v1_base_nyc_phase2.yaml
model:
  init_args:
    model_args:
      backbone: terramind_v1_base_tim     # vs terramind_v1_base
      tim_modalities: [LULC]              # generate synthetic LULC tokens first
      backbone_modalities: [S2L2A, S1RTC, DEM]   # actual inputs
      backbone_use_temporal: true
      backbone_temporal_n_timestamps: 4
      # rest unchanged from Phase 2

The TiM model generates synthetic LULC tokens from the input modalities, then uses those tokens AS ADDITIONAL CONTEXT for the downstream LULC head. Self-referential β€” the model "thinks in LULC" before predicting LULC.

For our 5-class NYC LULC where the GROUND TRUTH IS ALSO LULC, this is a slightly pathological case. The cleaner TiM ablation would use a different intermediate modality (NDVI from S2 β†’ LULC, or LULC from S1 alone). Worth testing both.

Plan

  1. Scaffold (this file) β€” done.
  2. Write tim_smoke.py β€” tiny smoke run to confirm TiM model loads and trains on our NYC dataset without architectural changes.
  3. Write phase3_tim.yaml β€” the TiM-enabled training config.
  4. Run the fine-tune (~6 GPU-hr).
  5. Eval against Phase-2 (no-TiM) on the same 64-chip held-out test split. Same metrics: per-class IoU, overall mIoU, Pixel_Accuracy, F1.
  6. Publish as msradam/TerraMind-base-NYC-TiM-LULC if it beats Phase 2 by at least 1pp on test mIoU.

Eval gate

Strong: > +2pp mIoU vs Phase 2 β†’ publish, headline result Acceptable: 0 to +2pp β†’ publish, "TiM stable on NYC" framing Negative: < 0 mIoU vs Phase 2 β†’ publish negative result, document framing

Risk

Medium. TiM recipe needs adaptation from sen1floods11's setup; 1-2 hours of debug time likely. Backup plan if TiM model variant doesn't load: implement TiM-as-input-augmentation manually (run base TerraMind in generate mode for synthetic LULC, concatenate to input for fine-tune).

Reproduction (planned)

docker exec terramind bash -c "
  terratorch fit --config /root/config_phase3_tim.yaml
  terratorch test --config /root/config_phase3_tim.yaml --ckpt_path .../best_val_loss.ckpt
"