Self-contained droplet redeploy: Dockerfile + bring-up script
Browse filesThree artifacts that turn the bootstrap-droplet's hand-built
container into a reproducible bring-up on any AMD ROCm GPU node:
- services/riprap-models/Dockerfile β extends the public
rocm/pytorch:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.9.1
image (same minor torch version as the bootstrap droplet's
bespoke +git build) with our pinned terratorch / granite-tsfm /
transformers / peft / sentence-transformers / gliner stack. Bakes
in the MI300X tuning env (HIP_FORCE_DEV_KERNARG=1 etc) so a fresh
container doesn't need the "remember to set these" incantation.
- services/riprap-models/requirements-full.txt β exact pip pins
captured from the running terramind container on 2026-05-05.
Curated: only the leaves the Dockerfile installs on top of the
ROCm PyTorch base; transitive deps resolve from these. Excludes
torch / torchvision / torchaudio / amd-* (provided by the base).
- scripts/deploy_droplet.sh β idempotent one-shot bring-up. Takes
IP + bearer token, verifies SSH + GPU device files, pulls vLLM,
builds riprap-models, starts both containers with --restart
unless-stopped, and waits for /v1/models + /healthz. Safe to
re-run on the same droplet (containers get rm -f'd and
recreated). Exits non-zero on healthcheck failure so it's
CI-wrappable.
README runbook covers the destroy + redeploy flow end-to-end:
spin up new droplet, run the script, update HF Space env vars,
restart Space, run probe_addresses.py against the new stack. What
survives destruction: this repo, HF Hub fine-tune artefacts. What
doesn't: the HF cache (re-downloaded ~12 GB on first request) and
the bearer token (generate fresh).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- scripts/deploy_droplet.sh +200 -0
- services/riprap-models/Dockerfile +63 -0
- services/riprap-models/README.md +103 -19
- services/riprap-models/requirements-full.txt +65 -0
|
@@ -0,0 +1,200 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
# Riprap GPU-droplet bring-up β vLLM + riprap-models, idempotent.
|
| 3 |
+
#
|
| 4 |
+
# Designed for a fresh AMD MI300X droplet (DigitalOcean GPU droplet,
|
| 5 |
+
# AMD Developer Cloud node, etc.) with nothing more than:
|
| 6 |
+
# - Ubuntu 22.04 / 24.04
|
| 7 |
+
# - Docker + AMD ROCm GPU drivers (kfd / dri device files)
|
| 8 |
+
# - SSH root access
|
| 9 |
+
#
|
| 10 |
+
# The script SSHes to the droplet, ensures the right images are
|
| 11 |
+
# pulled, builds the riprap-models container from this repo, starts
|
| 12 |
+
# both services, and runs healthchecks. Re-running on the same
|
| 13 |
+
# droplet is idempotent: existing containers are removed and
|
| 14 |
+
# recreated cleanly.
|
| 15 |
+
#
|
| 16 |
+
# Usage:
|
| 17 |
+
# scripts/deploy_droplet.sh <droplet-ip> <bearer-token>
|
| 18 |
+
#
|
| 19 |
+
# Example:
|
| 20 |
+
# scripts/deploy_droplet.sh 129.212.181.238 "$(cat /tmp/riprap/vllm_token.txt)"
|
| 21 |
+
#
|
| 22 |
+
# Env knobs (optional, all have sensible defaults):
|
| 23 |
+
# SSH_USER default "root"
|
| 24 |
+
# SSH_KEY path to ssh key; default uses ssh-agent
|
| 25 |
+
# VLLM_IMAGE default "vllm/vllm-openai-rocm:v0.17.1"
|
| 26 |
+
# VLLM_PORT default 8001 (host) β 8000 (container)
|
| 27 |
+
# MODELS_PORT default 7860 (host) β 7860 (container)
|
| 28 |
+
# MODEL_REPO default "ibm-granite/granite-4.1-8b"
|
| 29 |
+
# HF_CACHE_HOST default "/root/hf-cache" on droplet
|
| 30 |
+
# SKIP_BUILD "1" to skip building riprap-models image
|
| 31 |
+
# (assume it's already present on droplet)
|
| 32 |
+
#
|
| 33 |
+
# Exits non-zero on any step that fails β including the final
|
| 34 |
+
# healthcheck β so this is safe to wrap in CI.
|
| 35 |
+
set -euo pipefail
|
| 36 |
+
|
| 37 |
+
if [ "$#" -lt 2 ]; then
|
| 38 |
+
echo "Usage: $0 <droplet-ip> <bearer-token>" >&2
|
| 39 |
+
exit 64
|
| 40 |
+
fi
|
| 41 |
+
|
| 42 |
+
DROPLET_IP="$1"
|
| 43 |
+
TOKEN="$2"
|
| 44 |
+
|
| 45 |
+
SSH_USER="${SSH_USER:-root}"
|
| 46 |
+
SSH_KEY_FLAG=""
|
| 47 |
+
if [ -n "${SSH_KEY:-}" ]; then
|
| 48 |
+
SSH_KEY_FLAG="-i $SSH_KEY"
|
| 49 |
+
fi
|
| 50 |
+
SSH="ssh $SSH_KEY_FLAG -o StrictHostKeyChecking=accept-new -o ConnectTimeout=10 ${SSH_USER}@${DROPLET_IP}"
|
| 51 |
+
SCP="scp $SSH_KEY_FLAG -o StrictHostKeyChecking=accept-new"
|
| 52 |
+
|
| 53 |
+
VLLM_IMAGE="${VLLM_IMAGE:-vllm/vllm-openai-rocm:v0.17.1}"
|
| 54 |
+
VLLM_PORT="${VLLM_PORT:-8001}"
|
| 55 |
+
MODELS_PORT="${MODELS_PORT:-7860}"
|
| 56 |
+
MODEL_REPO="${MODEL_REPO:-ibm-granite/granite-4.1-8b}"
|
| 57 |
+
HF_CACHE_HOST="${HF_CACHE_HOST:-/root/hf-cache}"
|
| 58 |
+
SKIP_BUILD="${SKIP_BUILD:-0}"
|
| 59 |
+
|
| 60 |
+
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
| 61 |
+
|
| 62 |
+
echo "==> Riprap droplet bring-up"
|
| 63 |
+
echo " droplet ip: $DROPLET_IP"
|
| 64 |
+
echo " vllm port: $VLLM_PORT"
|
| 65 |
+
echo " models port: $MODELS_PORT"
|
| 66 |
+
echo " model repo: $MODEL_REPO"
|
| 67 |
+
echo " repo root: $REPO_ROOT"
|
| 68 |
+
echo
|
| 69 |
+
|
| 70 |
+
# ---- 1. Verify SSH + droplet readiness ----------------------------------
|
| 71 |
+
echo "==> 1. SSH connectivity + GPU device check"
|
| 72 |
+
$SSH bash -s <<'REMOTE'
|
| 73 |
+
set -e
|
| 74 |
+
if ! command -v docker > /dev/null; then
|
| 75 |
+
echo "[droplet] docker not installed; aborting" >&2
|
| 76 |
+
exit 1
|
| 77 |
+
fi
|
| 78 |
+
if [ ! -e /dev/kfd ] || [ ! -e /dev/dri ]; then
|
| 79 |
+
echo "[droplet] no AMD GPU device files (/dev/kfd or /dev/dri); aborting" >&2
|
| 80 |
+
exit 1
|
| 81 |
+
fi
|
| 82 |
+
echo "[droplet] docker + AMD GPU device files present"
|
| 83 |
+
docker --version
|
| 84 |
+
REMOTE
|
| 85 |
+
|
| 86 |
+
# ---- 2. Pull vLLM image ---------------------------------------------------
|
| 87 |
+
echo
|
| 88 |
+
echo "==> 2. Pull vLLM image (if not cached)"
|
| 89 |
+
$SSH "docker image inspect $VLLM_IMAGE > /dev/null 2>&1 || docker pull $VLLM_IMAGE"
|
| 90 |
+
|
| 91 |
+
# ---- 3. Sync riprap-models source to droplet -----------------------------
|
| 92 |
+
echo
|
| 93 |
+
echo "==> 3. Sync riprap-models source"
|
| 94 |
+
$SSH "mkdir -p /workspace/riprap-models /workspace/riprap-build"
|
| 95 |
+
# Sync Dockerfile + sources via tar over SSH (rsync may be missing on
|
| 96 |
+
# a minimal droplet; tar is part of any Linux base).
|
| 97 |
+
tar -C "$REPO_ROOT" -cf - services/riprap-models | \
|
| 98 |
+
$SSH "tar -C /workspace/riprap-build -xf -"
|
| 99 |
+
|
| 100 |
+
# ---- 4. Build riprap-models image ----------------------------------------
|
| 101 |
+
if [ "$SKIP_BUILD" = "1" ]; then
|
| 102 |
+
echo
|
| 103 |
+
echo "==> 4. Skipping image build (SKIP_BUILD=1)"
|
| 104 |
+
else
|
| 105 |
+
echo
|
| 106 |
+
echo "==> 4. Build riprap-models image"
|
| 107 |
+
echo " (this takes ~10-20 min on first build; subsequent builds"
|
| 108 |
+
echo " reuse layer cache and are < 1 min)"
|
| 109 |
+
$SSH "cd /workspace/riprap-build && \
|
| 110 |
+
docker build \
|
| 111 |
+
-t riprap-models:latest \
|
| 112 |
+
-f services/riprap-models/Dockerfile \
|
| 113 |
+
."
|
| 114 |
+
fi
|
| 115 |
+
|
| 116 |
+
# ---- 5. Start vLLM container ---------------------------------------------
|
| 117 |
+
echo
|
| 118 |
+
echo "==> 5. Start vLLM container"
|
| 119 |
+
$SSH bash -s <<REMOTE
|
| 120 |
+
set -e
|
| 121 |
+
docker rm -f vllm > /dev/null 2>&1 || true
|
| 122 |
+
mkdir -p ${HF_CACHE_HOST}
|
| 123 |
+
docker run -d --name vllm \\
|
| 124 |
+
--device=/dev/kfd --device=/dev/dri --group-add=video \\
|
| 125 |
+
--ipc=host --shm-size=16g \\
|
| 126 |
+
-p ${VLLM_PORT}:8000 \\
|
| 127 |
+
-v ${HF_CACHE_HOST}:/root/.cache/huggingface \\
|
| 128 |
+
-e GLOO_SOCKET_IFNAME=eth0 -e VLLM_HOST_IP=127.0.0.1 \\
|
| 129 |
+
--restart unless-stopped \\
|
| 130 |
+
${VLLM_IMAGE} \\
|
| 131 |
+
--model ${MODEL_REPO} \\
|
| 132 |
+
--host 0.0.0.0 --port 8000 --api-key "${TOKEN}" \\
|
| 133 |
+
--max-model-len 8192 --served-model-name granite-4.1-8b
|
| 134 |
+
echo "[droplet] vllm container started"
|
| 135 |
+
REMOTE
|
| 136 |
+
|
| 137 |
+
# ---- 6. Start riprap-models container ------------------------------------
|
| 138 |
+
echo
|
| 139 |
+
echo "==> 6. Start riprap-models container"
|
| 140 |
+
$SSH bash -s <<REMOTE
|
| 141 |
+
set -e
|
| 142 |
+
docker rm -f riprap-models > /dev/null 2>&1 || true
|
| 143 |
+
docker run -d --name riprap-models \\
|
| 144 |
+
--device=/dev/kfd --device=/dev/dri --group-add=video \\
|
| 145 |
+
--ipc=host --shm-size=8g \\
|
| 146 |
+
-p ${MODELS_PORT}:7860 \\
|
| 147 |
+
-v ${HF_CACHE_HOST}:/root/.cache/huggingface \\
|
| 148 |
+
-e RIPRAP_MODELS_API_KEY="${TOKEN}" \\
|
| 149 |
+
--restart unless-stopped \\
|
| 150 |
+
riprap-models:latest
|
| 151 |
+
echo "[droplet] riprap-models container started"
|
| 152 |
+
REMOTE
|
| 153 |
+
|
| 154 |
+
# ---- 7. Healthchecks -----------------------------------------------------
|
| 155 |
+
echo
|
| 156 |
+
echo "==> 7. Healthchecks"
|
| 157 |
+
echo " waiting up to 90s for vLLM to expose /v1/models..."
|
| 158 |
+
DEADLINE=$((SECONDS + 90))
|
| 159 |
+
while (( SECONDS < DEADLINE )); do
|
| 160 |
+
if curl -sf --max-time 5 "http://${DROPLET_IP}:${VLLM_PORT}/v1/models" \
|
| 161 |
+
-H "Authorization: Bearer ${TOKEN}" > /tmp/vllm-models.json 2>/dev/null; then
|
| 162 |
+
echo " vLLM ready: $(head -c 200 /tmp/vllm-models.json)..."
|
| 163 |
+
break
|
| 164 |
+
fi
|
| 165 |
+
sleep 3
|
| 166 |
+
done
|
| 167 |
+
if (( SECONDS >= DEADLINE )); then
|
| 168 |
+
echo " vLLM did not become ready in 90s; tailing container logs:" >&2
|
| 169 |
+
$SSH "docker logs --tail 30 vllm" >&2
|
| 170 |
+
exit 1
|
| 171 |
+
fi
|
| 172 |
+
|
| 173 |
+
echo " waiting up to 60s for riprap-models /healthz..."
|
| 174 |
+
DEADLINE=$((SECONDS + 60))
|
| 175 |
+
while (( SECONDS < DEADLINE )); do
|
| 176 |
+
if curl -sf --max-time 5 "http://${DROPLET_IP}:${MODELS_PORT}/healthz" \
|
| 177 |
+
> /tmp/models-health.json 2>/dev/null; then
|
| 178 |
+
echo " riprap-models ready: $(cat /tmp/models-health.json)"
|
| 179 |
+
break
|
| 180 |
+
fi
|
| 181 |
+
sleep 2
|
| 182 |
+
done
|
| 183 |
+
if (( SECONDS >= DEADLINE )); then
|
| 184 |
+
echo " riprap-models did not become ready in 60s; tailing container logs:" >&2
|
| 185 |
+
$SSH "docker logs --tail 30 riprap-models" >&2
|
| 186 |
+
exit 1
|
| 187 |
+
fi
|
| 188 |
+
|
| 189 |
+
echo
|
| 190 |
+
echo "==> DONE"
|
| 191 |
+
echo " vLLM http://${DROPLET_IP}:${VLLM_PORT}/v1/models"
|
| 192 |
+
echo " riprap-models http://${DROPLET_IP}:${MODELS_PORT}/healthz"
|
| 193 |
+
echo
|
| 194 |
+
echo "Set these in your local env or HF Space variables:"
|
| 195 |
+
echo " RIPRAP_LLM_PRIMARY=vllm"
|
| 196 |
+
echo " RIPRAP_LLM_BASE_URL=http://${DROPLET_IP}:${VLLM_PORT}/v1"
|
| 197 |
+
echo " RIPRAP_LLM_API_KEY=${TOKEN}"
|
| 198 |
+
echo " RIPRAP_ML_BACKEND=remote"
|
| 199 |
+
echo " RIPRAP_ML_BASE_URL=http://${DROPLET_IP}:${MODELS_PORT}"
|
| 200 |
+
echo " RIPRAP_ML_API_KEY=${TOKEN}"
|
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Riprap Models β droplet inference service.
|
| 2 |
+
#
|
| 3 |
+
# Self-contained ROCm + PyTorch image that runs every GPU-accelerable
|
| 4 |
+
# specialist Riprap consumes (Prithvi-NYC-Pluvial, TerraMind LULC +
|
| 5 |
+
# Buildings, Granite TTM r2, Granite Embedding 278M, GLiNER).
|
| 6 |
+
#
|
| 7 |
+
# Base: AMD's public ROCm 7.2.3 + Python 3.12 + PyTorch 2.9.1 release
|
| 8 |
+
# image. Same minor torch version as the bespoke MI300X image the
|
| 9 |
+
# bootstrap droplet was hand-built with (`torch==2.9.1+git8907517`),
|
| 10 |
+
# but pulled from a public registry so any fresh droplet can recreate
|
| 11 |
+
# the env without internal AMD wheels. The released 2.9.1 has the
|
| 12 |
+
# kernels we need β none of riprap-models calls into vLLM-specific
|
| 13 |
+
# attention paths, so the dev-build vs release-build delta is
|
| 14 |
+
# inconsequential for our forward passes.
|
| 15 |
+
#
|
| 16 |
+
# Build: docker build -t riprap-models:latest -f Dockerfile ../..
|
| 17 |
+
# Layout: the build context is the project root so the COPY lines
|
| 18 |
+
# below can reach `services/riprap-models/`.
|
| 19 |
+
FROM rocm/pytorch:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.9.1
|
| 20 |
+
|
| 21 |
+
ENV DEBIAN_FRONTEND=noninteractive \
|
| 22 |
+
PYTHONUNBUFFERED=1 \
|
| 23 |
+
PIP_NO_CACHE_DIR=1 \
|
| 24 |
+
PIP_DISABLE_PIP_VERSION_CHECK=1 \
|
| 25 |
+
HF_HOME=/root/.cache/huggingface \
|
| 26 |
+
TRANSFORMERS_CACHE=/root/.cache/huggingface \
|
| 27 |
+
# MI300X tuning the running container uses; baking them in so a
|
| 28 |
+
# bring-up doesn't require remembering the env-set incantation.
|
| 29 |
+
HIP_FORCE_DEV_KERNARG=1 \
|
| 30 |
+
HSA_NO_SCRATCH_RECLAIM=1 \
|
| 31 |
+
PYTORCH_ROCM_ARCH=gfx942
|
| 32 |
+
|
| 33 |
+
# git is needed by some HF model-card downloads (terratorch yaml repos
|
| 34 |
+
# pull via the git protocol). curl for healthcheck. libgl1 for
|
| 35 |
+
# rasterio's Pillow path. The base ROCm image is Ubuntu 24.04, and
|
| 36 |
+
# already includes most build-time deps we need.
|
| 37 |
+
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 38 |
+
curl git libgl1 libglib2.0-0 \
|
| 39 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 40 |
+
|
| 41 |
+
WORKDIR /workspace/riprap-models
|
| 42 |
+
|
| 43 |
+
# Install deps in two layers so a code-only change doesn't bust the
|
| 44 |
+
# heavy ML wheel cache. requirements.txt holds runtime-narrow
|
| 45 |
+
# packages that the service imports; requirements-full.txt is the
|
| 46 |
+
# super-set the FSM specialists pull in transitively (terratorch's
|
| 47 |
+
# kornia / albumentations chain, granite-tsfm's tsfm_public, etc.).
|
| 48 |
+
COPY services/riprap-models/requirements-full.txt /tmp/req-full.txt
|
| 49 |
+
RUN pip install --upgrade pip && \
|
| 50 |
+
pip install -r /tmp/req-full.txt
|
| 51 |
+
|
| 52 |
+
# Service code itself. Cheap to invalidate; lands last.
|
| 53 |
+
COPY services/riprap-models/main.py /workspace/riprap-models/main.py
|
| 54 |
+
COPY services/riprap-models/requirements.txt /workspace/riprap-models/requirements.txt
|
| 55 |
+
|
| 56 |
+
EXPOSE 7860
|
| 57 |
+
|
| 58 |
+
# `--proxy-headers` so a future LB sees the right client IP. The
|
| 59 |
+
# /healthz route is unauthenticated by design (operators want
|
| 60 |
+
# readiness probes to work without secrets); /v1/* requires the
|
| 61 |
+
# bearer token via RIPRAP_MODELS_API_KEY.
|
| 62 |
+
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860", \
|
| 63 |
+
"--log-level", "info", "--proxy-headers"]
|
|
@@ -22,35 +22,119 @@ Auth: bearer token on every `/v1/*` route via `RIPRAP_MODELS_API_KEY`.
|
|
| 22 |
Same shape as vLLM. `/healthz` is open so liveness probes don't need
|
| 23 |
auth.
|
| 24 |
|
| 25 |
-
## Deploy
|
| 26 |
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
service code lands under `/workspace/riprap-models/`; only deltas
|
| 31 |
-
need installing.
|
| 32 |
|
| 33 |
```bash
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
docker cp /workspace/riprap-models terramind:/workspace/
|
| 42 |
-
docker exec -d -e RIPRAP_MODELS_API_KEY="$
|
| 43 |
bash -c "cd /workspace/riprap-models && \
|
| 44 |
pip install --no-cache-dir -r requirements.txt && \
|
| 45 |
-
uvicorn main:app --host 0.0.0.0 --port 7860
|
| 46 |
-
> /workspace/riprap-models.log 2>&1"
|
| 47 |
REMOTE
|
| 48 |
```
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
## Local app config
|
| 56 |
|
|
|
|
| 22 |
Same shape as vLLM. `/healthz` is open so liveness probes don't need
|
| 23 |
auth.
|
| 24 |
|
| 25 |
+
## Deploy β fresh droplet (recommended)
|
| 26 |
|
| 27 |
+
Use the one-shot bring-up script. Works on any AMD ROCm GPU droplet
|
| 28 |
+
with Docker + GPU device files (`/dev/kfd`, `/dev/dri`) and SSH root
|
| 29 |
+
access. No prior container state required.
|
|
|
|
|
|
|
| 30 |
|
| 31 |
```bash
|
| 32 |
+
scripts/deploy_droplet.sh <droplet-ip> <bearer-token>
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
What it does, in order:
|
| 36 |
+
|
| 37 |
+
1. Verifies SSH + AMD GPU device files on the droplet
|
| 38 |
+
2. Pulls `vllm/vllm-openai-rocm:v0.17.1`
|
| 39 |
+
3. Tar-streams `services/riprap-models/` to `/workspace/riprap-build`
|
| 40 |
+
4. Builds `riprap-models:latest` from `services/riprap-models/Dockerfile`
|
| 41 |
+
(base: `rocm/pytorch:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.9.1`,
|
| 42 |
+
~10β20 min on first build, < 1 min on rebuild)
|
| 43 |
+
5. Starts both containers (`vllm` on host port 8001, `riprap-models`
|
| 44 |
+
on host port 7860) with `--restart unless-stopped` so they survive
|
| 45 |
+
reboots
|
| 46 |
+
6. Waits up to 90 s for vLLM `/v1/models` and 60 s for
|
| 47 |
+
riprap-models `/healthz`, exits non-zero if either misses
|
| 48 |
+
|
| 49 |
+
Re-running on the same droplet is idempotent β existing containers
|
| 50 |
+
get `docker rm -f`'d and recreated.
|
| 51 |
+
|
| 52 |
+
Env knobs:
|
| 53 |
+
|
| 54 |
+
| Var | Default | Purpose |
|
| 55 |
+
|---|---|---|
|
| 56 |
+
| `SSH_USER` | `root` | SSH login |
|
| 57 |
+
| `SSH_KEY` | (ssh-agent) | path to private key |
|
| 58 |
+
| `VLLM_PORT` | `8001` | host port mapping for vLLM |
|
| 59 |
+
| `MODELS_PORT` | `7860` | host port mapping for riprap-models |
|
| 60 |
+
| `MODEL_REPO` | `ibm-granite/granite-4.1-8b` | LLM repo |
|
| 61 |
+
| `HF_CACHE_HOST` | `/root/hf-cache` | HF cache mount on droplet |
|
| 62 |
+
| `SKIP_BUILD` | `0` | set `1` to skip Dockerfile build |
|
| 63 |
+
|
| 64 |
+
After it returns, set the printed env vars in your local shell or HF
|
| 65 |
+
Space variables, run `scripts/probe_addresses.py` to verify, and
|
| 66 |
+
you're live.
|
| 67 |
+
|
| 68 |
+
## Deploy β extend an existing container (legacy)
|
| 69 |
|
| 70 |
+
If you already have a `terramind` container with the heavy ML deps
|
| 71 |
+
baked in (the bootstrap-droplet path), you can skip the Dockerfile
|
| 72 |
+
build and install the runtime deltas only:
|
| 73 |
+
|
| 74 |
+
```bash
|
| 75 |
+
ssh root@<ip> 'mkdir -p /workspace/riprap-models'
|
| 76 |
+
rsync -av --delete services/riprap-models/ root@<ip>:/workspace/riprap-models/
|
| 77 |
+
ssh root@<ip> bash <<'REMOTE'
|
| 78 |
docker cp /workspace/riprap-models terramind:/workspace/
|
| 79 |
+
docker exec -d -e RIPRAP_MODELS_API_KEY="$TOKEN" terramind \
|
| 80 |
bash -c "cd /workspace/riprap-models && \
|
| 81 |
pip install --no-cache-dir -r requirements.txt && \
|
| 82 |
+
uvicorn main:app --host 0.0.0.0 --port 7860"
|
|
|
|
| 83 |
REMOTE
|
| 84 |
```
|
| 85 |
|
| 86 |
+
This path uses `requirements.txt` (deltas only); the Dockerfile path
|
| 87 |
+
above uses `requirements-full.txt` (everything). Service is
|
| 88 |
+
externally reachable at `http://<droplet-ip>:7860` once the host port
|
| 89 |
+
mapping was set when the container was created.
|
| 90 |
+
|
| 91 |
+
## Destroy + redeploy runbook
|
| 92 |
+
|
| 93 |
+
What survives a droplet destruction:
|
| 94 |
+
|
| 95 |
+
- `services/riprap-models/Dockerfile` + `requirements-full.txt` β
|
| 96 |
+
every pinned dep, captured from the bootstrap droplet on 2026-05-05
|
| 97 |
+
- `scripts/deploy_droplet.sh` β the bring-up script
|
| 98 |
+
- HF Hub model artefacts β every fine-tune lives at
|
| 99 |
+
`msradam/Prithvi-EO-2.0-NYC-Pluvial`,
|
| 100 |
+
`msradam/TerraMind-NYC-Adapters`,
|
| 101 |
+
`msradam/Granite-TTM-r2-Battery-Surge`. The Dockerfile pulls them
|
| 102 |
+
fresh on first request
|
| 103 |
+
|
| 104 |
+
What does NOT survive:
|
| 105 |
+
|
| 106 |
+
- The HF cache at `${HF_CACHE_HOST}` (default `/root/hf-cache`) on
|
| 107 |
+
the droplet β every redeploy re-downloads ~12 GB of weights
|
| 108 |
+
(Granite 4.1 8b for vLLM ~16 GB, Prithvi v2 ~1.3 GB, TerraMind
|
| 109 |
+
adapters ~600 MB, Granite Embedding ~600 MB, GLiNER ~400 MB,
|
| 110 |
+
Granite TTM r2 ~6 MB). First query after redeploy takes ~30 s
|
| 111 |
+
longer than steady-state because of the lazy model load
|
| 112 |
+
- The bearer token β generate a fresh one when re-deploying
|
| 113 |
+
|
| 114 |
+
To redeploy:
|
| 115 |
+
|
| 116 |
+
```bash
|
| 117 |
+
# 1. Spin up a new GPU droplet (DigitalOcean / AMD Developer Cloud)
|
| 118 |
+
# 2. Copy your SSH key to it (DO usually does this for you)
|
| 119 |
+
# 3. Run:
|
| 120 |
+
TOKEN=$(openssl rand -base64 24)
|
| 121 |
+
scripts/deploy_droplet.sh <new-ip> "$TOKEN"
|
| 122 |
+
|
| 123 |
+
# 4. Update HF Space env vars to point at the new IP
|
| 124 |
+
huggingface-cli space variables \
|
| 125 |
+
lablab-ai-amd-developer-hackathon/riprap-nyc \
|
| 126 |
+
RIPRAP_LLM_BASE_URL=http://<new-ip>:8001/v1 \
|
| 127 |
+
RIPRAP_LLM_API_KEY=$TOKEN \
|
| 128 |
+
RIPRAP_ML_BASE_URL=http://<new-ip>:7860 \
|
| 129 |
+
RIPRAP_ML_API_KEY=$TOKEN
|
| 130 |
+
|
| 131 |
+
# 5. Restart the HF Space so it picks up the new env vars
|
| 132 |
+
huggingface-cli space restart lablab-ai-amd-developer-hackathon/riprap-nyc
|
| 133 |
+
|
| 134 |
+
# 6. Verify end-to-end against the redeployed stack
|
| 135 |
+
.venv/bin/python scripts/probe_addresses.py \
|
| 136 |
+
--base https://lablab-ai-amd-developer-hackathon-riprap-nyc.hf.space
|
| 137 |
+
```
|
| 138 |
|
| 139 |
## Local app config
|
| 140 |
|
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Riprap Models β full runtime requirements
|
| 2 |
+
#
|
| 3 |
+
# Pinned to the exact versions the bootstrap MI300X container ran with,
|
| 4 |
+
# captured via `pip freeze` inside the running `terramind` container on
|
| 5 |
+
# 2026-05-05. Keep these pins until something in the spec needs to
|
| 6 |
+
# change β the AMD ROCm + terratorch + tsfm_public stack has narrow
|
| 7 |
+
# version compatibility windows.
|
| 8 |
+
#
|
| 9 |
+
# Torch / torchvision / torchaudio are NOT pinned here because they
|
| 10 |
+
# come from the base image (rocm/pytorch ROCm 7.2.3 + torch 2.9.1
|
| 11 |
+
# release). Pinning them again would cause pip to attempt a re-install
|
| 12 |
+
# of a different ABI and break the build.
|
| 13 |
+
|
| 14 |
+
# ---- Core HF / transformers stack ----------------------------------------
|
| 15 |
+
transformers==4.57.6
|
| 16 |
+
peft==0.18.1
|
| 17 |
+
accelerate==1.13.0
|
| 18 |
+
safetensors==0.8.0rc0
|
| 19 |
+
huggingface_hub==0.36.2
|
| 20 |
+
sentence-transformers==5.4.1
|
| 21 |
+
gliner==0.2.26
|
| 22 |
+
|
| 23 |
+
# ---- IBM Granite TimeSeries TTM r2 (TTM forecast specialists) ------------
|
| 24 |
+
granite-tsfm==0.3.6
|
| 25 |
+
|
| 26 |
+
# ---- Prithvi-EO / TerraMind serving stack --------------------------------
|
| 27 |
+
# terratorch pulls torchgeo, lightning, jsonargparse, kornia, timm, einops,
|
| 28 |
+
# albumentations, etc. Pinning the leaves explicitly so transitive bumps
|
| 29 |
+
# don't drift the FSM specialists' behaviour silently.
|
| 30 |
+
terratorch==1.2.7
|
| 31 |
+
torchgeo==0.9.0
|
| 32 |
+
torchmetrics==1.9.0
|
| 33 |
+
lightning==2.6.1
|
| 34 |
+
jsonargparse==4.48.0
|
| 35 |
+
albumentations==2.0.8
|
| 36 |
+
albucore==0.0.24
|
| 37 |
+
kornia==0.8.2
|
| 38 |
+
timm==1.0.25
|
| 39 |
+
einops==0.8.2
|
| 40 |
+
|
| 41 |
+
# ---- Geospatial I/O (used by the NYC-cropping helpers) -------------------
|
| 42 |
+
rasterio==1.5.0
|
| 43 |
+
pyproj==3.7.2
|
| 44 |
+
geopandas==1.1.3
|
| 45 |
+
shapely==2.1.2
|
| 46 |
+
pystac==1.14.3
|
| 47 |
+
pystac-client==0.9.0
|
| 48 |
+
rioxarray==0.22.0
|
| 49 |
+
xarray==2026.4.0
|
| 50 |
+
tifffile==2026.5.2
|
| 51 |
+
ImageIO==2.37.3
|
| 52 |
+
|
| 53 |
+
# ---- Numeric core --------------------------------------------------------
|
| 54 |
+
numpy==2.4.4
|
| 55 |
+
pandas==3.0.0
|
| 56 |
+
scipy==1.17.1
|
| 57 |
+
scikit-learn==1.8.0
|
| 58 |
+
pillow==12.1.1
|
| 59 |
+
|
| 60 |
+
# ---- Web / IO ------------------------------------------------------------
|
| 61 |
+
fastapi==0.135.1
|
| 62 |
+
uvicorn==0.41.0
|
| 63 |
+
pydantic==2.12.5
|
| 64 |
+
httpx==0.28.1
|
| 65 |
+
requests==2.32.5
|