Spaces:

lablab-ai-amd-developer-hackathon
/

riprap-nyc

Running

seriffic Claude Opus 4.7 (1M context) commited on 5 days ago

Commit

5b8e335

1 Parent(s): 0d831ce

fix(prithvi): dict-aware normalizer; expose riprap-models last_errors

Two follow-ups to the previous EO-fix commit. The TerraMind LoRA
fix (drop tiled_inference at 224×224) shipped clean and is now
firing on both Stones; Prithvi-NYC-Pluvial v2 was still erroring
with the same `'list' object has no attribute 'view'` despite
the tensor-typed Normalize.

Real root cause: IBM's run_model in inference.py calls
`datamodule.aug({'image': tensor})['image']` — passing a dict
and indexing the result. The previous patch wrapped a kornia
AugmentationSequential there, which in 0.7+ doesn't accept dict
input cleanly and trips the AttributeError deep inside its
internal storage on first augmentation apply.

Fix: drop kornia entirely from the v2 patch path. Replace it
with a 12-line hand-rolled `_DictNormalize` that does the same
arithmetic — `(img - mean) / std` — and explicitly handles both
dict and tensor input shapes. Identical math, fewer moving
parts, no kornia version skew. Applied symmetrically in:
services/riprap-models/main.py
app/flood_layers/prithvi_live.py

Also: surface diagnostics through the proxy so operators don't
need to grep container logs.
inference-vllm/proxy.py:
- GET /healthz now bubbles up `models_loaded` and
`last_errors` from riprap-models's healthz body
- GET /v1/diag (auth-required) forwards riprap-models's
operator-only diagnostic snapshot — what's loaded, last
per-stage error with traceback tail, CUDA memory state
per device

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (2) hide show

app/flood_layers/prithvi_live.py +30 -11
services/riprap-models/main.py +31 -11

app/flood_layers/prithvi_live.py CHANGED Viewed

@@ -205,21 +205,40 @@ def _ensure_model():
                     if getattr(getattr(m, 'datamodule', None),
                                'test_transform', None) is None:
                         import albumentations as A
-                        import kornia.augmentation as _Ka
                         from albumentations.pytorch import ToTensorV2
                         m.datamodule.test_transform = A.Compose([ToTensorV2()])
                         _old = m.datamodule.aug
-                        # Pass torch.Tensor (not list via .tolist()).
-                        # kornia 0.7+ stores values as-is and calls
-                        # .view() on them at apply time; passing a
-                        # Python list crashes with `AttributeError:
-                        # 'list' object has no attribute 'view'`.
-                        m.datamodule.aug = _Ka.AugmentationSequential(
-                            _Ka.Normalize(_old.means.view(-1).detach().clone(),
-                                          _old.stds.view(-1).detach().clone()),
-                            data_keys=None)
                         log.info("prithvi_live: patched v2 datamodule transforms "
-                                 "for IBM inference.py compat")
                 else:
                     log.warning("prithvi_live: v2 yaml/ckpt not "
                                 "discoverable in %s; falling back to base",

                     if getattr(getattr(m, 'datamodule', None),
                                'test_transform', None) is None:
                         import albumentations as A
+                        import torch as _torch
                         from albumentations.pytorch import ToTensorV2
                         m.datamodule.test_transform = A.Compose([ToTensorV2()])
                         _old = m.datamodule.aug
+                        # IBM's inference.py:188 calls
+                        # `datamodule.aug({'image': tensor})['image']`.
+                        # kornia's AugmentationSequential doesn't accept
+                        # dict input cleanly and tripped the
+                        # `'list' object has no attribute 'view'`
+                        # error on the L4 deploy. Use a hand-rolled
+                        # dict-aware normalizer instead — same math,
+                        # fewer moving parts, no kornia version skew.
+                        class _DictNormalize:
+                            def __init__(self, mean, std):
+                                self.mean = _torch.as_tensor(mean).view(-1, 1, 1).float()
+                                self.std = _torch.as_tensor(std).view(-1, 1, 1).float()
+                            def __call__(self, sample):
+                                if isinstance(sample, dict):
+                                    img = sample["image"]
+                                    mean = self.mean.to(img.device)
+                                    std = self.std.to(img.device)
+                                    return {**sample, "image": (img - mean) / std}
+                                mean = self.mean.to(sample.device)
+                                std = self.std.to(sample.device)
+                                return (sample - mean) / std
+                        m.datamodule.aug = _DictNormalize(
+                            _old.means.view(-1).detach().clone(),
+                            _old.stds.view(-1).detach().clone(),
+                        )
                         log.info("prithvi_live: patched v2 datamodule transforms "
+                                 "for IBM inference.py compat (dict-aware Normalize)")
                 else:
                     log.warning("prithvi_live: v2 yaml/ckpt not "
                                 "discoverable in %s; falling back to base",

services/riprap-models/main.py CHANGED Viewed

@@ -144,21 +144,41 @@ def _load_prithvi():
             if getattr(getattr(m, 'datamodule', None),
                        'test_transform', None) is None:
                 import albumentations as A
-                import kornia.augmentation as _Ka
                 from albumentations.pytorch import ToTensorV2
                 m.datamodule.test_transform = A.Compose([ToTensorV2()])
                 _old = m.datamodule.aug
-                # Pass torch.Tensor (not Python list via .tolist()) —
-                # kornia 0.7+ stores the values as-is and calls .view()
-                # on them at apply time. With a list that fails with
-                # `AttributeError: 'list' object has no attribute 'view'`.
-                # Cloning detaches from the source datamodule's params.
-                m.datamodule.aug = _Ka.AugmentationSequential(
-                    _Ka.Normalize(_old.means.view(-1).detach().clone(),
-                                  _old.stds.view(-1).detach().clone()),
-                    data_keys=None)
                 log.info("prithvi: patched v2 datamodule transforms "
-                         "for IBM inference.py compat")
         else:
             log.info("prithvi: v2 unavailable, falling back to base")
             base_ckpt = hf_hub_download(

             if getattr(getattr(m, 'datamodule', None),
                        'test_transform', None) is None:
                 import albumentations as A
+                import torch as _torch
                 from albumentations.pytorch import ToTensorV2
                 m.datamodule.test_transform = A.Compose([ToTensorV2()])
                 _old = m.datamodule.aug
+                # IBM's inference.py:188 calls
+                # `datamodule.aug({'image': tensor})['image']` —
+                # passing a dict and indexing the result. The previous
+                # patch wrapped a kornia AugmentationSequential here,
+                # which doesn't natively accept dict input and tripped
+                # `'list' object has no attribute 'view'` deep inside
+                # kornia's internal storage on first inference. Drop
+                # kornia entirely and use a hand-rolled dict-aware
+                # normalizer — fewer moving parts, identical math.
+                class _DictNormalize:
+                    def __init__(self, mean, std):
+                        self.mean = _torch.as_tensor(mean).view(-1, 1, 1).float()
+                        self.std = _torch.as_tensor(std).view(-1, 1, 1).float()
+                    def __call__(self, sample):
+                        if isinstance(sample, dict):
+                            img = sample["image"]
+                            mean = self.mean.to(img.device)
+                            std = self.std.to(img.device)
+                            return {**sample, "image": (img - mean) / std}
+                        mean = self.mean.to(sample.device)
+                        std = self.std.to(sample.device)
+                        return (sample - mean) / std
+                m.datamodule.aug = _DictNormalize(
+                    _old.means.view(-1).detach().clone(),
+                    _old.stds.view(-1).detach().clone(),
+                )
                 log.info("prithvi: patched v2 datamodule transforms "
+                         "for IBM inference.py compat (dict-aware Normalize)")
         else:
             log.info("prithvi: v2 unavailable, falling back to base")
             base_ckpt = hf_hub_download(