Spaces:

lablab-ai-amd-developer-hackathon
/

riprap-nyc

Running

App Files Files Community

riprap-nyc / experiments /11_live_sentinel_fetch /RESULTS.md

seriffic

Backend evolution: Phases 1-10 specialists + agentic FSM + Mellea + LiteLLM router

6a82282 3 days ago

preview code

raw

history blame contribute delete

6.96 kB

Phase 11 — Live Sentinel imagery fetch for TerraMind-NYC

Goal

Replace the cached Major-TOM monotemporal chips (frozen 2020-2025 acquisition window) with a live fetch path so that app/context/terramind_nyc.py can run inference on the most-recent qualifying Sentinel-2 + Sentinel-1 acquisition for any NYC point. The imagery freshness is then a number Granite can cite alongside the prediction.

What live actually means here

Sentinel revisit times, honestly:

Source	Native revisit	With cloud filter	STAC availability
Sentinel-2 (S2A + S2B)	5 days	5–15 days	< 24 h after acquisition
Sentinel-1 (S1A + S1C)	~6 days	n/a (radar)	< 24 h after acquisition

So "live" = "most-recent qualifying acquisition" = typically 1–7 days old. We disclose the per-query age so a Granite synthesis can cite exactly how fresh the imagery is.

Sources tested

probe_earth_search.py — Element 84 / AWS Open Data

Anonymous, no auth, COG-streamable. Result for Empire State Building:

Modality	Result
Sentinel-2 L2A	acquired 1 day ago, 7.0% cloud, 1.4 s chip read
Sentinel-1 GRD (raw slant-range)	acquired 4 days ago, no embedded CRS; needs RTC processing
Total wall-clock (S2 only)	3.5 s

S2 is great. GRD is unusable for our model: it ships in slant range without a CRS, so reprojection to a chip grid fails. We need RTC.

Earth Search's collection list as of 2026-05-05:

sentinel-2-l2a, sentinel-2-l1c, sentinel-2-c1-l2a, sentinel-2-pre-c1-l2a,
sentinel-1-grd,
cop-dem-glo-30, cop-dem-glo-90,
landsat-c2-l2, naip

Notably no sentinel-1-rtc. So Earth Search alone cannot serve the SAR modality our model needs.

probe_pc_s1rtc.py — Microsoft Planetary Computer

Anonymous via URL signing, has the sentinel-1-rtc collection. Result:

Modality	Result
Sentinel-1 RTC	acquired 4 days ago, EPSG:32618 (UTM-18N), 2.7 s chip read
Total wall-clock	3.3 s

Despite our prior experience (May 3 evening showed >50% timeout rate), PC was reliable and fast on May 4 evening. The flakiness appears event-driven, not chronic.

Sovereignty matrix

Source	Host	Auth	Sovereignty	Verdict for Riprap
ESA Copernicus Data Space (CDSE)	ESA	Free registration	EU sovereign, authoritative	Best for production civic-tech, requires user-side credential setup
NASA Earthdata / ASF	NASA	Earthdata Login (free)	US sovereign, used by FEMA/USGS	Same registration friction as CDSE
Element 84 / AWS Open Data	AWS	None	Private cloud, public access	Zero-friction; data is ESA-authoritative; host is private
Microsoft Planetary Computer	Microsoft	None (URL signing)	Private cloud, public access	Zero-friction; flakiness risk

The DATA is ESA Copernicus under Copernicus License regardless of host. The HOST differs in sovereignty story.

Recommended architecture

For Riprap's deployment story (anonymous-by-default, sovereignty-aware, swap-in capable for credentialed sovereign sources):

Primary path (anonymous, zero-friction):
  - Sentinel-2 L2A   from Earth Search (Element 84 / AWS Open Data)
  - Copernicus DEM   from Earth Search (cop-dem-glo-30)
  - Sentinel-1 RTC   from Microsoft Planetary Computer (URL-signed)

Optional sovereign override (set RIPRAP_SENTINEL_SOURCE=cdse with creds):
  - All modalities   from ESA Copernicus Data Space directly

Disclosure in every briefing:
  "Sentinel-2 acquired N days ago, Sentinel-1 acquired M days ago,
   sourced from <host>. Data: ESA Copernicus License."

Per-query budget on a fresh fetch (uncached):

Earth Search S2 + DEM: ~2 s
PC S1 RTC: ~3 s
Model inference: ~0.5 s
Total: ~5–6 s per query

With per-MGRS-cell caching (chips don't change between revisits within a 5-day window for the same scene), repeat queries hit local cache and return in < 1 s.

What changes in the integration

app/context/terramind_nyc.py (the new specialist) replaces its current "load from local Major-TOM cache" path with a fetch_recent_chips(lat, lon) function that tries Earth Search first, then PC for S1-RTC. Cache is keyed by (s2_mgrs_tile, s2_acquisition_date) so cold-cache wall-clock is the ~5 s above and warm-cache is < 100 ms.

The output dict that goes into Granite's document context gains:

{
  ...,
  "s2_acquired_iso": "2026-05-04T16:01:44Z",
  "s2_age_days": 1,
  "s2_cloud_pct": 7.0,
  "s2_source": "Element 84 Earth Search (ESA Copernicus License)",
  "s1_acquired_iso": "2026-05-01T22:51:31Z",
  "s1_age_days": 4,
  "s1_source": "Microsoft Planetary Computer (ESA Copernicus License)",
  "imagery_freshness_disclosed": True,
}

Granite can cite both ages and both sources directly.

What this enables in the briefing

A Brighton Beach briefing currently can't say anything about current imagery. After integration, it can:

"Structural land cover at this 2.56 km tile is 78% developed, 7% open water, 14% green space [terramind_nyc]. Sentinel-2 imagery acquired 1 day ago [esa_s2]; Sentinel-1 SAR acquired 4 days ago [esa_s1]. The high imperviousness limits stormwater absorption, compounding the address's coastal Sandy-zone exposure [sandy]."

Three new cite-able facts: imperviousness, S2 age, S1 age. All defensible against ground truth.

Honest limitations

Cloud cover. When S2 is cloudy, the most-recent low-cloud acquisition might be 7–15 days old. Disclosed per query.
PC reliability. Bursty timeouts during high-load windows. Retry logic + fallback to S2-only inference (zero-fill S1 channel) is the right defensive posture.
No RTC anonymously. Earth Search has no sentinel-1-rtc so we depend on PC for S1. If PC is down, briefing falls back to S2-only with explicit "S1 unavailable for this query" disclosure.
Sovereignty. AWS Open Data and PC are private-cloud-hosted mirrors of ESA-authoritative data. The data is sovereign; the host is not. For deployments requiring full sovereignty, CDSE direct is the swap-in path.

What to land in `app/`

Two files when this experiment graduates:

app/context/sentinel_live.py — fetch_recent_chips(lat, lon) with the multi-source fallback chain, retry logic, per-MGRS cell cache
app/context/terramind_nyc.py — replaces load_local_chips() with a call to sentinel_live.fetch_recent_chips, otherwise unchanged

Plus tests in tests/ against three NYC reference points (Manhattan center, Brighton Beach, Bronx Zoo) with a mock STAC client for offline CI.

License + attribution

ESA Copernicus License: free for any use including commercial, with attribution to Copernicus and the originating mission. Riprap's existing attribution block needs to add "Sentinel-1 / Sentinel-2 imagery courtesy of ESA Copernicus" alongside the existing NYC OpenData / NOAA / FEMA attributions.