Spaces:

lablab-ai-amd-developer-hackathon
/

riprap-nyc

Running

App Files Files Community

riprap-nyc / experiments /12_terramind_tim /RESULTS.md

seriffic

Backend evolution: Phases 1-10 specialists + agentic FSM + Mellea + LiteLLM router

6a82282 4 days ago

preview code

raw

history blame contribute delete

2.95 kB

	# Phase 12 — TerraMind TiM (Thinking-in-Modalities) on NYC LULC

	## Goal

	Replicate IBM-ESA's headline TerraMind innovation — TiM (Thinking-in-Modalities) —
	on our NYC LULC task. The hypothesis from the TerraMind paper (Jakubik et al.,
	arXiv:2504.11171) is that generating intermediate modality tokens (e.g.,
	synthetic LULC) BEFORE predicting downstream improves accuracy by 2–5 pp.

	This is the paper-grade differentiator for the hackathon submission. To my
	knowledge nobody has publicly reproduced TiM on NYC.

	## Status

	Scaffold + recipe research. Awaits GPU window.

	## Recipe (from TerraMind GitHub examples)

	The reference is `terramind_v1_small_sen1floods11.ipynb` in IBM's terramind
	repo, which shows TiM with `tim_modalities: [LULC]` for binary water seg.

	Adaptation for our 5-class NYC LULC:

	```yaml
	# delta from training/terramind_v1_base_nyc_phase2.yaml
	model:
	init_args:
	model_args:
	backbone: terramind_v1_base_tim # vs terramind_v1_base
	tim_modalities: [LULC] # generate synthetic LULC tokens first
	backbone_modalities: [S2L2A, S1RTC, DEM] # actual inputs
	backbone_use_temporal: true
	backbone_temporal_n_timestamps: 4
	# rest unchanged from Phase 2
	```

	The TiM model generates synthetic LULC tokens from the input modalities,
	then uses those tokens AS ADDITIONAL CONTEXT for the downstream LULC head.
	Self-referential — the model "thinks in LULC" before predicting LULC.

	For our 5-class NYC LULC where the GROUND TRUTH IS ALSO LULC, this is a
	slightly pathological case. The cleaner TiM ablation would use a different
	intermediate modality (NDVI from S2 → LULC, or LULC from S1 alone). Worth
	testing both.

	## Plan

	1. Scaffold (this file) — done.
	2. Write `tim_smoke.py` — tiny smoke run to confirm TiM model loads and
	trains on our NYC dataset without architectural changes.
	3. Write `phase3_tim.yaml` — the TiM-enabled training config.
	4. Run the fine-tune (~6 GPU-hr).
	5. Eval against Phase-2 (no-TiM) on the same 64-chip held-out test split.
	Same metrics: per-class IoU, overall mIoU, Pixel_Accuracy, F1.
	6. Publish as `msradam/TerraMind-base-NYC-TiM-LULC` if it beats Phase 2 by
	at least 1pp on test mIoU.

	## Eval gate

	Strong: > +2pp mIoU vs Phase 2 → publish, headline result
	Acceptable: 0 to +2pp → publish, "TiM stable on NYC" framing
	Negative: < 0 mIoU vs Phase 2 → publish negative result, document framing

	## Risk

	Medium. TiM recipe needs adaptation from sen1floods11's setup; 1-2 hours
	of debug time likely. Backup plan if TiM model variant doesn't load:
	implement TiM-as-input-augmentation manually (run base TerraMind in
	generate mode for synthetic LULC, concatenate to input for fine-tune).

	## Reproduction (planned)

	```bash
	docker exec terramind bash -c "
	terratorch fit --config /root/config_phase3_tim.yaml
	terratorch test --config /root/config_phase3_tim.yaml --ckpt_path .../best_val_loss.ckpt
	"
	```