File size: 6,955 Bytes
6a82282 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | # Phase 11 β Live Sentinel imagery fetch for TerraMind-NYC
## Goal
Replace the cached Major-TOM monotemporal chips (frozen 2020-2025
acquisition window) with a *live* fetch path so that
`app/context/terramind_nyc.py` can run inference on the most-recent
qualifying Sentinel-2 + Sentinel-1 acquisition for any NYC point. The
imagery freshness is then a number Granite can cite alongside the
prediction.
## What live actually means here
Sentinel revisit times, honestly:
| Source | Native revisit | With cloud filter | STAC availability |
|---|---|---|---|
| Sentinel-2 (S2A + S2B) | 5 days | 5β15 days | < 24 h after acquisition |
| Sentinel-1 (S1A + S1C) | ~6 days | n/a (radar) | < 24 h after acquisition |
So "live" = "most-recent qualifying acquisition" = typically 1β7 days
old. We disclose the per-query age so a Granite synthesis can cite
exactly how fresh the imagery is.
## Sources tested
### probe_earth_search.py β Element 84 / AWS Open Data
Anonymous, no auth, COG-streamable. Result for Empire State Building:
| Modality | Result |
|---|---|
| Sentinel-2 L2A | acquired **1 day ago**, 7.0% cloud, 1.4 s chip read |
| Sentinel-1 GRD (raw slant-range) | acquired 4 days ago, **no embedded CRS**; needs RTC processing |
| Total wall-clock (S2 only) | **3.5 s** |
S2 is great. **GRD is unusable for our model**: it ships in slant range
without a CRS, so reprojection to a chip grid fails. We need RTC.
Earth Search's collection list as of 2026-05-05:
```
sentinel-2-l2a, sentinel-2-l1c, sentinel-2-c1-l2a, sentinel-2-pre-c1-l2a,
sentinel-1-grd,
cop-dem-glo-30, cop-dem-glo-90,
landsat-c2-l2, naip
```
Notably **no `sentinel-1-rtc`**. So Earth Search alone cannot serve the
SAR modality our model needs.
### probe_pc_s1rtc.py β Microsoft Planetary Computer
Anonymous via URL signing, has the `sentinel-1-rtc` collection. Result:
| Modality | Result |
|---|---|
| Sentinel-1 RTC | acquired **4 days ago**, EPSG:32618 (UTM-18N), 2.7 s chip read |
| Total wall-clock | **3.3 s** |
Despite our prior experience (May 3 evening showed >50% timeout rate),
PC was reliable and fast on May 4 evening. The flakiness appears
event-driven, not chronic.
## Sovereignty matrix
| Source | Host | Auth | Sovereignty | Verdict for Riprap |
|---|---|---|---|---|
| **ESA Copernicus Data Space (CDSE)** | ESA | Free registration | EU sovereign, authoritative | Best for production civic-tech, requires user-side credential setup |
| **NASA Earthdata / ASF** | NASA | Earthdata Login (free) | US sovereign, used by FEMA/USGS | Same registration friction as CDSE |
| **Element 84 / AWS Open Data** | AWS | None | Private cloud, public access | Zero-friction; data is ESA-authoritative; host is private |
| **Microsoft Planetary Computer** | Microsoft | None (URL signing) | Private cloud, public access | Zero-friction; flakiness risk |
The DATA is ESA Copernicus under Copernicus License regardless of host.
The HOST differs in sovereignty story.
## Recommended architecture
For Riprap's deployment story (anonymous-by-default, sovereignty-aware,
swap-in capable for credentialed sovereign sources):
```
Primary path (anonymous, zero-friction):
- Sentinel-2 L2A from Earth Search (Element 84 / AWS Open Data)
- Copernicus DEM from Earth Search (cop-dem-glo-30)
- Sentinel-1 RTC from Microsoft Planetary Computer (URL-signed)
Optional sovereign override (set RIPRAP_SENTINEL_SOURCE=cdse with creds):
- All modalities from ESA Copernicus Data Space directly
Disclosure in every briefing:
"Sentinel-2 acquired N days ago, Sentinel-1 acquired M days ago,
sourced from <host>. Data: ESA Copernicus License."
```
Per-query budget on a fresh fetch (uncached):
- Earth Search S2 + DEM: ~2 s
- PC S1 RTC: ~3 s
- Model inference: ~0.5 s
- **Total: ~5β6 s per query**
With per-MGRS-cell caching (chips don't change between revisits within
a 5-day window for the same scene), repeat queries hit local cache and
return in < 1 s.
## What changes in the integration
`app/context/terramind_nyc.py` (the new specialist) replaces its current
"load from local Major-TOM cache" path with a `fetch_recent_chips(lat, lon)`
function that tries Earth Search first, then PC for S1-RTC. Cache is keyed
by (s2_mgrs_tile, s2_acquisition_date) so cold-cache wall-clock is the
~5 s above and warm-cache is < 100 ms.
The output dict that goes into Granite's document context gains:
```python
{
...,
"s2_acquired_iso": "2026-05-04T16:01:44Z",
"s2_age_days": 1,
"s2_cloud_pct": 7.0,
"s2_source": "Element 84 Earth Search (ESA Copernicus License)",
"s1_acquired_iso": "2026-05-01T22:51:31Z",
"s1_age_days": 4,
"s1_source": "Microsoft Planetary Computer (ESA Copernicus License)",
"imagery_freshness_disclosed": True,
}
```
Granite can cite both ages and both sources directly.
## What this enables in the briefing
A Brighton Beach briefing currently can't say anything about *current*
imagery. After integration, it can:
> "Structural land cover at this 2.56 km tile is **78% developed,
> 7% open water, 14% green space** [terramind_nyc]. Sentinel-2 imagery
> acquired 1 day ago [esa_s2]; Sentinel-1 SAR acquired 4 days ago
> [esa_s1]. The high imperviousness limits stormwater absorption,
> compounding the address's coastal Sandy-zone exposure [sandy]."
Three new cite-able facts: imperviousness, S2 age, S1 age. All
defensible against ground truth.
## Honest limitations
- **Cloud cover.** When S2 is cloudy, the most-recent low-cloud
acquisition might be 7β15 days old. Disclosed per query.
- **PC reliability.** Bursty timeouts during high-load windows. Retry
logic + fallback to S2-only inference (zero-fill S1 channel) is
the right defensive posture.
- **No RTC anonymously.** Earth Search has no `sentinel-1-rtc` so we
depend on PC for S1. If PC is down, briefing falls back to S2-only
with explicit "S1 unavailable for this query" disclosure.
- **Sovereignty.** AWS Open Data and PC are private-cloud-hosted
mirrors of ESA-authoritative data. The data is sovereign; the host
is not. For deployments requiring full sovereignty, CDSE direct is
the swap-in path.
## What to land in `app/`
Two files when this experiment graduates:
1. `app/context/sentinel_live.py` β `fetch_recent_chips(lat, lon)` with
the multi-source fallback chain, retry logic, per-MGRS cell cache
2. `app/context/terramind_nyc.py` β replaces `load_local_chips()` with
a call to `sentinel_live.fetch_recent_chips`, otherwise unchanged
Plus tests in `tests/` against three NYC reference points (Manhattan
center, Brighton Beach, Bronx Zoo) with a mock STAC client for offline
CI.
## License + attribution
ESA Copernicus License: free for any use including commercial, with
attribution to Copernicus and the originating mission. Riprap's existing
attribution block needs to add "Sentinel-1 / Sentinel-2 imagery courtesy
of ESA Copernicus" alongside the existing NYC OpenData / NOAA / FEMA
attributions.
|