Spaces:

lablab-ai-amd-developer-hackathon
/

riprap-nyc

Running

seriffic Claude Opus 4.7 (1M context) commited on 3 days ago

Commit

62af342

1 Parent(s): d2e48df

Self-contained droplet redeploy: Dockerfile + bring-up script

Three artifacts that turn the bootstrap-droplet's hand-built
container into a reproducible bring-up on any AMD ROCm GPU node:

- services/riprap-models/Dockerfile — extends the public
rocm/pytorch:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.9.1
image (same minor torch version as the bootstrap droplet's
bespoke +git build) with our pinned terratorch / granite-tsfm /
transformers / peft / sentence-transformers / gliner stack. Bakes
in the MI300X tuning env (HIP_FORCE_DEV_KERNARG=1 etc) so a fresh
container doesn't need the "remember to set these" incantation.

- services/riprap-models/requirements-full.txt — exact pip pins
captured from the running terramind container on 2026-05-05.
Curated: only the leaves the Dockerfile installs on top of the
ROCm PyTorch base; transitive deps resolve from these. Excludes
torch / torchvision / torchaudio / amd-* (provided by the base).

- scripts/deploy_droplet.sh — idempotent one-shot bring-up. Takes
IP + bearer token, verifies SSH + GPU device files, pulls vLLM,
builds riprap-models, starts both containers with --restart
unless-stopped, and waits for /v1/models + /healthz. Safe to
re-run on the same droplet (containers get rm -f'd and
recreated). Exits non-zero on healthcheck failure so it's
CI-wrappable.

README runbook covers the destroy + redeploy flow end-to-end:
spin up new droplet, run the script, update HF Space env vars,
restart Space, run probe_addresses.py against the new stack. What
survives destruction: this repo, HF Hub fine-tune artefacts. What
doesn't: the HF cache (re-downloaded ~12 GB on first request) and
the bearer token (generate fresh).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (4) hide show

scripts/deploy_droplet.sh +200 -0
services/riprap-models/Dockerfile +63 -0
services/riprap-models/README.md +103 -19
services/riprap-models/requirements-full.txt +65 -0

scripts/deploy_droplet.sh ADDED Viewed

	@@ -0,0 +1,200 @@

+#!/usr/bin/env bash
+# Riprap GPU-droplet bring-up — vLLM + riprap-models, idempotent.
+#
+# Designed for a fresh AMD MI300X droplet (DigitalOcean GPU droplet,
+# AMD Developer Cloud node, etc.) with nothing more than:
+#   - Ubuntu 22.04 / 24.04
+#   - Docker + AMD ROCm GPU drivers (kfd / dri device files)
+#   - SSH root access
+#
+# The script SSHes to the droplet, ensures the right images are
+# pulled, builds the riprap-models container from this repo, starts
+# both services, and runs healthchecks. Re-running on the same
+# droplet is idempotent: existing containers are removed and
+# recreated cleanly.
+#
+# Usage:
+#   scripts/deploy_droplet.sh <droplet-ip> <bearer-token>
+#
+# Example:
+#   scripts/deploy_droplet.sh 129.212.181.238 "$(cat /tmp/riprap/vllm_token.txt)"
+#
+# Env knobs (optional, all have sensible defaults):
+#   SSH_USER             default "root"
+#   SSH_KEY              path to ssh key; default uses ssh-agent
+#   VLLM_IMAGE           default "vllm/vllm-openai-rocm:v0.17.1"
+#   VLLM_PORT            default 8001 (host) → 8000 (container)
+#   MODELS_PORT          default 7860 (host) → 7860 (container)
+#   MODEL_REPO           default "ibm-granite/granite-4.1-8b"
+#   HF_CACHE_HOST        default "/root/hf-cache" on droplet
+#   SKIP_BUILD           "1" to skip building riprap-models image
+#                        (assume it's already present on droplet)
+#
+# Exits non-zero on any step that fails — including the final
+# healthcheck — so this is safe to wrap in CI.
+set -euo pipefail
+if [ "$#" -lt 2 ]; then
+    echo "Usage: $0 <droplet-ip> <bearer-token>" >&2
+    exit 64
+fi
+DROPLET_IP="$1"
+TOKEN="$2"
+SSH_USER="${SSH_USER:-root}"
+SSH_KEY_FLAG=""
+if [ -n "${SSH_KEY:-}" ]; then
+    SSH_KEY_FLAG="-i $SSH_KEY"
+fi
+SSH="ssh $SSH_KEY_FLAG -o StrictHostKeyChecking=accept-new -o ConnectTimeout=10 ${SSH_USER}@${DROPLET_IP}"
+SCP="scp $SSH_KEY_FLAG -o StrictHostKeyChecking=accept-new"
+VLLM_IMAGE="${VLLM_IMAGE:-vllm/vllm-openai-rocm:v0.17.1}"
+VLLM_PORT="${VLLM_PORT:-8001}"
+MODELS_PORT="${MODELS_PORT:-7860}"
+MODEL_REPO="${MODEL_REPO:-ibm-granite/granite-4.1-8b}"
+HF_CACHE_HOST="${HF_CACHE_HOST:-/root/hf-cache}"
+SKIP_BUILD="${SKIP_BUILD:-0}"
+REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
+echo "==> Riprap droplet bring-up"
+echo "    droplet ip:   $DROPLET_IP"
+echo "    vllm port:    $VLLM_PORT"
+echo "    models port:  $MODELS_PORT"
+echo "    model repo:   $MODEL_REPO"
+echo "    repo root:    $REPO_ROOT"
+echo
+# ---- 1. Verify SSH + droplet readiness ----------------------------------
+echo "==> 1. SSH connectivity + GPU device check"
+$SSH bash -s <<'REMOTE'
+set -e
+if ! command -v docker > /dev/null; then
+    echo "[droplet] docker not installed; aborting" >&2
+    exit 1
+fi
+if [ ! -e /dev/kfd ] || [ ! -e /dev/dri ]; then
+    echo "[droplet] no AMD GPU device files (/dev/kfd or /dev/dri); aborting" >&2
+    exit 1
+fi
+echo "[droplet] docker + AMD GPU device files present"
+docker --version
+REMOTE
+# ---- 2. Pull vLLM image ---------------------------------------------------
+echo
+echo "==> 2. Pull vLLM image (if not cached)"
+$SSH "docker image inspect $VLLM_IMAGE > /dev/null 2>&1 || docker pull $VLLM_IMAGE"
+# ---- 3. Sync riprap-models source to droplet -----------------------------
+echo
+echo "==> 3. Sync riprap-models source"
+$SSH "mkdir -p /workspace/riprap-models /workspace/riprap-build"
+# Sync Dockerfile + sources via tar over SSH (rsync may be missing on
+# a minimal droplet; tar is part of any Linux base).
+tar -C "$REPO_ROOT" -cf - services/riprap-models | \
+    $SSH "tar -C /workspace/riprap-build -xf -"
+# ---- 4. Build riprap-models image ----------------------------------------
+if [ "$SKIP_BUILD" = "1" ]; then
+    echo
+    echo "==> 4. Skipping image build (SKIP_BUILD=1)"
+else
+    echo
+    echo "==> 4. Build riprap-models image"
+    echo "    (this takes ~10-20 min on first build; subsequent builds"
+    echo "     reuse layer cache and are < 1 min)"
+    $SSH "cd /workspace/riprap-build && \
+          docker build \
+            -t riprap-models:latest \
+            -f services/riprap-models/Dockerfile \
+            ."
+fi
+# ---- 5. Start vLLM container ---------------------------------------------
+echo
+echo "==> 5. Start vLLM container"
+$SSH bash -s <<REMOTE
+set -e
+docker rm -f vllm > /dev/null 2>&1 || true
+mkdir -p ${HF_CACHE_HOST}
+docker run -d --name vllm \\
+    --device=/dev/kfd --device=/dev/dri --group-add=video \\
+    --ipc=host --shm-size=16g \\
+    -p ${VLLM_PORT}:8000 \\
+    -v ${HF_CACHE_HOST}:/root/.cache/huggingface \\
+    -e GLOO_SOCKET_IFNAME=eth0 -e VLLM_HOST_IP=127.0.0.1 \\
+    --restart unless-stopped \\
+    ${VLLM_IMAGE} \\
+    --model ${MODEL_REPO} \\
+    --host 0.0.0.0 --port 8000 --api-key "${TOKEN}" \\
+    --max-model-len 8192 --served-model-name granite-4.1-8b
+echo "[droplet] vllm container started"
+REMOTE
+# ---- 6. Start riprap-models container ------------------------------------
+echo
+echo "==> 6. Start riprap-models container"
+$SSH bash -s <<REMOTE
+set -e
+docker rm -f riprap-models > /dev/null 2>&1 || true
+docker run -d --name riprap-models \\
+    --device=/dev/kfd --device=/dev/dri --group-add=video \\
+    --ipc=host --shm-size=8g \\
+    -p ${MODELS_PORT}:7860 \\
+    -v ${HF_CACHE_HOST}:/root/.cache/huggingface \\
+    -e RIPRAP_MODELS_API_KEY="${TOKEN}" \\
+    --restart unless-stopped \\
+    riprap-models:latest
+echo "[droplet] riprap-models container started"
+REMOTE
+# ---- 7. Healthchecks -----------------------------------------------------
+echo
+echo "==> 7. Healthchecks"
+echo "    waiting up to 90s for vLLM to expose /v1/models..."
+DEADLINE=$((SECONDS + 90))
+while (( SECONDS < DEADLINE )); do
+    if curl -sf --max-time 5 "http://${DROPLET_IP}:${VLLM_PORT}/v1/models" \
+            -H "Authorization: Bearer ${TOKEN}" > /tmp/vllm-models.json 2>/dev/null; then
+        echo "    vLLM ready: $(head -c 200 /tmp/vllm-models.json)..."
+        break
+    fi
+    sleep 3
+done
+if (( SECONDS >= DEADLINE )); then
+    echo "    vLLM did not become ready in 90s; tailing container logs:" >&2
+    $SSH "docker logs --tail 30 vllm" >&2
+    exit 1
+fi
+echo "    waiting up to 60s for riprap-models /healthz..."
+DEADLINE=$((SECONDS + 60))
+while (( SECONDS < DEADLINE )); do
+    if curl -sf --max-time 5 "http://${DROPLET_IP}:${MODELS_PORT}/healthz" \
+            > /tmp/models-health.json 2>/dev/null; then
+        echo "    riprap-models ready: $(cat /tmp/models-health.json)"
+        break
+    fi
+    sleep 2
+done
+if (( SECONDS >= DEADLINE )); then
+    echo "    riprap-models did not become ready in 60s; tailing container logs:" >&2
+    $SSH "docker logs --tail 30 riprap-models" >&2
+    exit 1
+fi
+echo
+echo "==> DONE"
+echo "    vLLM         http://${DROPLET_IP}:${VLLM_PORT}/v1/models"
+echo "    riprap-models http://${DROPLET_IP}:${MODELS_PORT}/healthz"
+echo
+echo "Set these in your local env or HF Space variables:"
+echo "    RIPRAP_LLM_PRIMARY=vllm"
+echo "    RIPRAP_LLM_BASE_URL=http://${DROPLET_IP}:${VLLM_PORT}/v1"
+echo "    RIPRAP_LLM_API_KEY=${TOKEN}"
+echo "    RIPRAP_ML_BACKEND=remote"
+echo "    RIPRAP_ML_BASE_URL=http://${DROPLET_IP}:${MODELS_PORT}"
+echo "    RIPRAP_ML_API_KEY=${TOKEN}"

services/riprap-models/Dockerfile ADDED Viewed

	@@ -0,0 +1,63 @@

+# Riprap Models — droplet inference service.
+#
+# Self-contained ROCm + PyTorch image that runs every GPU-accelerable
+# specialist Riprap consumes (Prithvi-NYC-Pluvial, TerraMind LULC +
+# Buildings, Granite TTM r2, Granite Embedding 278M, GLiNER).
+#
+# Base: AMD's public ROCm 7.2.3 + Python 3.12 + PyTorch 2.9.1 release
+# image. Same minor torch version as the bespoke MI300X image the
+# bootstrap droplet was hand-built with (`torch==2.9.1+git8907517`),
+# but pulled from a public registry so any fresh droplet can recreate
+# the env without internal AMD wheels. The released 2.9.1 has the
+# kernels we need — none of riprap-models calls into vLLM-specific
+# attention paths, so the dev-build vs release-build delta is
+# inconsequential for our forward passes.
+#
+# Build:    docker build -t riprap-models:latest -f Dockerfile ../..
+# Layout:   the build context is the project root so the COPY lines
+#           below can reach `services/riprap-models/`.
+FROM rocm/pytorch:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.9.1
+ENV DEBIAN_FRONTEND=noninteractive \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_DISABLE_PIP_VERSION_CHECK=1 \
+    HF_HOME=/root/.cache/huggingface \
+    TRANSFORMERS_CACHE=/root/.cache/huggingface \
+    # MI300X tuning the running container uses; baking them in so a
+    # bring-up doesn't require remembering the env-set incantation.
+    HIP_FORCE_DEV_KERNARG=1 \
+    HSA_NO_SCRATCH_RECLAIM=1 \
+    PYTORCH_ROCM_ARCH=gfx942
+# git is needed by some HF model-card downloads (terratorch yaml repos
+# pull via the git protocol). curl for healthcheck. libgl1 for
+# rasterio's Pillow path. The base ROCm image is Ubuntu 24.04, and
+# already includes most build-time deps we need.
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        curl git libgl1 libglib2.0-0 \
+    && rm -rf /var/lib/apt/lists/*
+WORKDIR /workspace/riprap-models
+# Install deps in two layers so a code-only change doesn't bust the
+# heavy ML wheel cache. requirements.txt holds runtime-narrow
+# packages that the service imports; requirements-full.txt is the
+# super-set the FSM specialists pull in transitively (terratorch's
+# kornia / albumentations chain, granite-tsfm's tsfm_public, etc.).
+COPY services/riprap-models/requirements-full.txt /tmp/req-full.txt
+RUN pip install --upgrade pip && \
+    pip install -r /tmp/req-full.txt
+# Service code itself. Cheap to invalidate; lands last.
+COPY services/riprap-models/main.py /workspace/riprap-models/main.py
+COPY services/riprap-models/requirements.txt /workspace/riprap-models/requirements.txt
+EXPOSE 7860
+# `--proxy-headers` so a future LB sees the right client IP. The
+# /healthz route is unauthenticated by design (operators want
+# readiness probes to work without secrets); /v1/* requires the
+# bearer token via RIPRAP_MODELS_API_KEY.
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860", \
+     "--log-level", "info", "--proxy-headers"]

services/riprap-models/README.md CHANGED Viewed

@@ -22,35 +22,119 @@ Auth: bearer token on every `/v1/*` route via `RIPRAP_MODELS_API_KEY`.
 Same shape as vLLM. `/healthz` is open so liveness probes don't need
 auth.
-## Deploy
-The droplet's existing `terramind` container already has
-`torch+ROCm 7.0`, `terratorch 1.2.7`, `granite-tsfm 0.3.6`,
-`transformers 4.57`, `peft`, `safetensors`, `fastapi`, `uvicorn`. The
-service code lands under `/workspace/riprap-models/`; only deltas
-need installing.
 ```bash
-# Copy code (run from project root)
-ssh root@129.212.181.238 'mkdir -p /workspace/riprap-models'
-rsync -av --delete services/riprap-models/ \
-    root@129.212.181.238:/workspace/riprap-models/
-# Install deltas + start uvicorn inside the terramind container
-ssh root@129.212.181.238 bash <<'REMOTE'
 docker cp /workspace/riprap-models terramind:/workspace/
-docker exec -d -e RIPRAP_MODELS_API_KEY="$RIPRAP_MODELS_API_KEY" terramind \
   bash -c "cd /workspace/riprap-models && \
            pip install --no-cache-dir -r requirements.txt && \
-           uvicorn main:app --host 0.0.0.0 --port 7860 --log-level info \
-                  > /workspace/riprap-models.log 2>&1"
 REMOTE
 ```
-Service binds inside the container at `:7860`; the host port
-mapping was set when the `terramind` container was created
-(`docker run -p 7860:7860 ...`), so externally the service is at
-`http://129.212.181.238:7860`.
 ## Local app config

 Same shape as vLLM. `/healthz` is open so liveness probes don't need
 auth.
+## Deploy — fresh droplet (recommended)
+Use the one-shot bring-up script. Works on any AMD ROCm GPU droplet
+with Docker + GPU device files (`/dev/kfd`, `/dev/dri`) and SSH root
+access. No prior container state required.
 ```bash
+scripts/deploy_droplet.sh <droplet-ip> <bearer-token>
+```
+What it does, in order:
+1. Verifies SSH + AMD GPU device files on the droplet
+2. Pulls `vllm/vllm-openai-rocm:v0.17.1`
+3. Tar-streams `services/riprap-models/` to `/workspace/riprap-build`
+4. Builds `riprap-models:latest` from `services/riprap-models/Dockerfile`
+   (base: `rocm/pytorch:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.9.1`,
+   ~10–20 min on first build, < 1 min on rebuild)
+5. Starts both containers (`vllm` on host port 8001, `riprap-models`
+   on host port 7860) with `--restart unless-stopped` so they survive
+   reboots
+6. Waits up to 90 s for vLLM `/v1/models` and 60 s for
+   riprap-models `/healthz`, exits non-zero if either misses
+Re-running on the same droplet is idempotent — existing containers
+get `docker rm -f`'d and recreated.
+Env knobs:
+| Var | Default | Purpose |
+|---|---|---|
+| `SSH_USER` | `root` | SSH login |
+| `SSH_KEY` | (ssh-agent) | path to private key |
+| `VLLM_PORT` | `8001` | host port mapping for vLLM |
+| `MODELS_PORT` | `7860` | host port mapping for riprap-models |
+| `MODEL_REPO` | `ibm-granite/granite-4.1-8b` | LLM repo |
+| `HF_CACHE_HOST` | `/root/hf-cache` | HF cache mount on droplet |
+| `SKIP_BUILD` | `0` | set `1` to skip Dockerfile build |
+After it returns, set the printed env vars in your local shell or HF
+Space variables, run `scripts/probe_addresses.py` to verify, and
+you're live.
+## Deploy — extend an existing container (legacy)
+If you already have a `terramind` container with the heavy ML deps
+baked in (the bootstrap-droplet path), you can skip the Dockerfile
+build and install the runtime deltas only:
+```bash
+ssh root@<ip> 'mkdir -p /workspace/riprap-models'
+rsync -av --delete services/riprap-models/ root@<ip>:/workspace/riprap-models/
+ssh root@<ip> bash <<'REMOTE'
 docker cp /workspace/riprap-models terramind:/workspace/
+docker exec -d -e RIPRAP_MODELS_API_KEY="$TOKEN" terramind \
   bash -c "cd /workspace/riprap-models && \
            pip install --no-cache-dir -r requirements.txt && \
+           uvicorn main:app --host 0.0.0.0 --port 7860"
 REMOTE
 ```
+This path uses `requirements.txt` (deltas only); the Dockerfile path
+above uses `requirements-full.txt` (everything). Service is
+externally reachable at `http://<droplet-ip>:7860` once the host port
+mapping was set when the container was created.
+## Destroy + redeploy runbook
+What survives a droplet destruction:
+- `services/riprap-models/Dockerfile` + `requirements-full.txt` —
+  every pinned dep, captured from the bootstrap droplet on 2026-05-05
+- `scripts/deploy_droplet.sh` — the bring-up script
+- HF Hub model artefacts — every fine-tune lives at
+  `msradam/Prithvi-EO-2.0-NYC-Pluvial`,
+  `msradam/TerraMind-NYC-Adapters`,
+  `msradam/Granite-TTM-r2-Battery-Surge`. The Dockerfile pulls them
+  fresh on first request
+What does NOT survive:
+- The HF cache at `${HF_CACHE_HOST}` (default `/root/hf-cache`) on
+  the droplet — every redeploy re-downloads ~12 GB of weights
+  (Granite 4.1 8b for vLLM ~16 GB, Prithvi v2 ~1.3 GB, TerraMind
+  adapters ~600 MB, Granite Embedding ~600 MB, GLiNER ~400 MB,
+  Granite TTM r2 ~6 MB). First query after redeploy takes ~30 s
+  longer than steady-state because of the lazy model load
+- The bearer token — generate a fresh one when re-deploying
+To redeploy:
+```bash
+# 1. Spin up a new GPU droplet (DigitalOcean / AMD Developer Cloud)
+# 2. Copy your SSH key to it (DO usually does this for you)
+# 3. Run:
+TOKEN=$(openssl rand -base64 24)
+scripts/deploy_droplet.sh <new-ip> "$TOKEN"
+# 4. Update HF Space env vars to point at the new IP
+huggingface-cli space variables \
+  lablab-ai-amd-developer-hackathon/riprap-nyc \
+  RIPRAP_LLM_BASE_URL=http://<new-ip>:8001/v1 \
+  RIPRAP_LLM_API_KEY=$TOKEN \
+  RIPRAP_ML_BASE_URL=http://<new-ip>:7860 \
+  RIPRAP_ML_API_KEY=$TOKEN
+# 5. Restart the HF Space so it picks up the new env vars
+huggingface-cli space restart lablab-ai-amd-developer-hackathon/riprap-nyc
+# 6. Verify end-to-end against the redeployed stack
+.venv/bin/python scripts/probe_addresses.py \
+  --base https://lablab-ai-amd-developer-hackathon-riprap-nyc.hf.space
+```
 ## Local app config

services/riprap-models/requirements-full.txt ADDED Viewed

	@@ -0,0 +1,65 @@

+# Riprap Models — full runtime requirements
+#
+# Pinned to the exact versions the bootstrap MI300X container ran with,
+# captured via `pip freeze` inside the running `terramind` container on
+# 2026-05-05. Keep these pins until something in the spec needs to
+# change — the AMD ROCm + terratorch + tsfm_public stack has narrow
+# version compatibility windows.
+#
+# Torch / torchvision / torchaudio are NOT pinned here because they
+# come from the base image (rocm/pytorch ROCm 7.2.3 + torch 2.9.1
+# release). Pinning them again would cause pip to attempt a re-install
+# of a different ABI and break the build.
+# ---- Core HF / transformers stack ----------------------------------------
+transformers==4.57.6
+peft==0.18.1
+accelerate==1.13.0
+safetensors==0.8.0rc0
+huggingface_hub==0.36.2
+sentence-transformers==5.4.1
+gliner==0.2.26
+# ---- IBM Granite TimeSeries TTM r2 (TTM forecast specialists) ------------
+granite-tsfm==0.3.6
+# ---- Prithvi-EO / TerraMind serving stack --------------------------------
+# terratorch pulls torchgeo, lightning, jsonargparse, kornia, timm, einops,
+# albumentations, etc. Pinning the leaves explicitly so transitive bumps
+# don't drift the FSM specialists' behaviour silently.
+terratorch==1.2.7
+torchgeo==0.9.0
+torchmetrics==1.9.0
+lightning==2.6.1
+jsonargparse==4.48.0
+albumentations==2.0.8
+albucore==0.0.24
+kornia==0.8.2
+timm==1.0.25
+einops==0.8.2
+# ---- Geospatial I/O (used by the NYC-cropping helpers) -------------------
+rasterio==1.5.0
+pyproj==3.7.2
+geopandas==1.1.3
+shapely==2.1.2
+pystac==1.14.3
+pystac-client==0.9.0
+rioxarray==0.22.0
+xarray==2026.4.0
+tifffile==2026.5.2
+ImageIO==2.37.3
+# ---- Numeric core --------------------------------------------------------
+numpy==2.4.4
+pandas==3.0.0
+scipy==1.17.1
+scikit-learn==1.8.0
+pillow==12.1.1
+# ---- Web / IO ------------------------------------------------------------
+fastapi==0.135.1
+uvicorn==0.41.0
+pydantic==2.12.5
+httpx==0.28.1
+requests==2.32.5