File size: 2,945 Bytes
6a82282
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# Phase 12 β€” TerraMind TiM (Thinking-in-Modalities) on NYC LULC

## Goal

Replicate IBM-ESA's headline TerraMind innovation β€” TiM (Thinking-in-Modalities) β€”
on our NYC LULC task. The hypothesis from the TerraMind paper (Jakubik et al.,
arXiv:2504.11171) is that generating intermediate modality tokens (e.g.,
synthetic LULC) BEFORE predicting downstream improves accuracy by 2–5 pp.

This is the *paper-grade differentiator* for the hackathon submission. To my
knowledge nobody has publicly reproduced TiM on NYC.

## Status

Scaffold + recipe research. Awaits GPU window.

## Recipe (from TerraMind GitHub examples)

The reference is `terramind_v1_small_sen1floods11.ipynb` in IBM's terramind
repo, which shows TiM with `tim_modalities: [LULC]` for binary water seg.

Adaptation for our 5-class NYC LULC:

```yaml
# delta from training/terramind_v1_base_nyc_phase2.yaml
model:
  init_args:
    model_args:
      backbone: terramind_v1_base_tim     # vs terramind_v1_base
      tim_modalities: [LULC]              # generate synthetic LULC tokens first
      backbone_modalities: [S2L2A, S1RTC, DEM]   # actual inputs
      backbone_use_temporal: true
      backbone_temporal_n_timestamps: 4
      # rest unchanged from Phase 2
```

The TiM model generates synthetic LULC tokens from the input modalities,
then uses those tokens AS ADDITIONAL CONTEXT for the downstream LULC head.
Self-referential β€” the model "thinks in LULC" before predicting LULC.

For our 5-class NYC LULC where the GROUND TRUTH IS ALSO LULC, this is a
slightly pathological case. The cleaner TiM ablation would use a different
intermediate modality (NDVI from S2 β†’ LULC, or LULC from S1 alone). Worth
testing both.

## Plan

1. Scaffold (this file) β€” done.
2. Write `tim_smoke.py` β€” tiny smoke run to confirm TiM model loads and
   trains on our NYC dataset without architectural changes.
3. Write `phase3_tim.yaml` β€” the TiM-enabled training config.
4. Run the fine-tune (~6 GPU-hr).
5. Eval against Phase-2 (no-TiM) on the same 64-chip held-out test split.
   Same metrics: per-class IoU, overall mIoU, Pixel_Accuracy, F1.
6. Publish as `msradam/TerraMind-base-NYC-TiM-LULC` if it beats Phase 2 by
   at least 1pp on test mIoU.

## Eval gate

Strong: > +2pp mIoU vs Phase 2 β†’ publish, headline result
Acceptable: 0 to +2pp β†’ publish, "TiM stable on NYC" framing
Negative: < 0 mIoU vs Phase 2 β†’ publish negative result, document framing

## Risk

Medium. TiM recipe needs adaptation from sen1floods11's setup; 1-2 hours
of debug time likely. Backup plan if TiM model variant doesn't load:
implement TiM-as-input-augmentation manually (run base TerraMind in
generate mode for synthetic LULC, concatenate to input for fine-tune).

## Reproduction (planned)

```bash
docker exec terramind bash -c "
  terratorch fit --config /root/config_phase3_tim.yaml
  terratorch test --config /root/config_phase3_tim.yaml --ckpt_path .../best_val_loss.ckpt
"
```