Spaces:
Configuration error
Configuration error
File size: 16,073 Bytes
ab4f0a6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 | # Droplet Runbook
_Last verified: 2026-05-06 (live introspection of droplet 569363721)_
## Spec
| Field | Value |
|-------|-------|
| Provider | DigitalOcean GPU Droplet (AMD Developer Cloud) |
| Droplet ID | 569363721 |
| Size slug | `gpu-mi300x1-192gb` (from hostname `0.17.1-gpu-mi300x1-192gb-devcloud-atl1`) |
| Region | `atl1` (Atlanta) |
| OS | Ubuntu 24.04.4 LTS |
| Kernel | 6.8.0-106-generic |
| Disk | 697 GiB root, 112 GiB used at inspection |
| RAM | 235 GiB |
| Swap | None |
| GPU | AMD Instinct MI300X VF (gfx942, model 0x74b5) |
| VRAM | 192 GiB (205,822,885,888 bytes) |
| ROCm SMI | 4.0.0+fc0010cf6a |
| ROCm lib | 7.8.0 (installed via `repo.radeon.com/rocm/apt/7.2`) |
| Docker | CE 29.4.2 (from official `download.docker.com/linux/ubuntu`) |
## Services
| Container | Image | Host Port | Container Port | Purpose |
|-----------|-------|-----------|----------------|---------|
| `vllm` | `vllm/vllm-openai-rocm:v0.17.1` | 8001 | 8000 | OpenAI-compatible LLM API (Granite 4.1 8B) |
| `riprap-models` | `riprap-models:latest` (local build) | 7860 | 7860 | GPU-specialist FastAPI service (Prithvi, TerraMind, GLiNER, Granite Embed, TTM) |
Both have `--restart unless-stopped`. Docker is systemd-enabled, so the full stack
auto-starts on reboot with no manual intervention.
A **Caddy** process runs natively (port 80, systemd service) configured to reverse-proxy
to `localhost:8888`. Nothing was listening on 8888 at inspection time β this appears to
be a leftover placeholder, not load-bearing for Riprap.
## Existing provisioning scripts
| Script | What it does | Status |
|--------|--------------|--------|
| `scripts/deploy_droplet.sh` | Full bring-up: SSH verify, pull vLLM image, tar-stream + build riprap-models, start both containers, healthcheck. Idempotent β removes and recreates containers on re-run. | **Complete.** The canonical bring-up script. |
| `scripts/smoke_test_gpu.sh` | 4-check smoke: vLLM /v1/models, vLLM /v1/chat/completions, riprap-models /healthz, riprap-models /v1/granite-embed, /v1/gliner-extract. | **Complete.** Run after deploy to confirm the stack is live. |
| `scripts/save_droplet_image.sh` | Commits the running container, saves + compresses to a local tarball via scp. Useful as a fallback if the public-base Dockerfile rebuild fails. | Complete but **moot** once the bootstrap droplet is destroyed β requires a live droplet to extract from. |
| `scripts/probe_addresses.py` | End-to-end test against `/api/agent/stream` on the HF Space. 5/5 must pass before merging. | Not a droplet-setup script; it tests the full system end-to-end. |
**Gap:** No `update_hf_env.sh` exists. Updating HF Space env vars after a redeploy (new IP
or new token) is a manual `huggingface-cli space variables` command β see Β§Required
secrets below. This would be a good script to add.
**Gap:** No `redeploy.sh` wrapper exists. `deploy_droplet.sh` handles bring-up on a fresh
droplet but does not handle the HF Space variable update or the post-deploy probe run.
A `redeploy.sh` that chains `deploy_droplet.sh β huggingface-cli variables update β
probe_addresses.py` would complete the loop.
## Recreation steps
### 1. Provision the droplet
Use the DigitalOcean console or `doctl`. The exact size slug used was
`gpu-mi300x1-192gb`; pick `atl1` for the AMD Developer Cloud node type.
```bash
doctl compute droplet create riprap-gpu \
--size gpu-mi300x1-192gb \
--region atl1 \
--image ubuntu-24-04-x64 \
--ssh-keys <your-key-id>
```
Confirm `/dev/kfd` and `/dev/dri` are present before continuing:
```bash
ssh root@<new-ip> "ls /dev/kfd /dev/dri"
```
> **Note:** The AMD Developer Cloud GPU droplet image pre-installs ROCm and Docker.
> Steps 2β3 below document what was observed on the live system. On a fresh image from
> DigitalOcean's AMD GPU catalog they may already be satisfied β verify before running.
### 2. ROCm install
ROCm 7.2 was installed via the AMD repo. The following sources were present in
`/etc/apt/sources.list.d/`:
```
# /etc/apt/sources.list.d/rocm.list
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.2 noble main
# /etc/apt/sources.list.d/amdgpu.list
deb [arch=amd64,i386 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/30.30/ubuntu noble main
# /etc/apt/sources.list.d/device-metrics-exporter.list
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/device-metrics-exporter/apt/1.4.0 noble main
```
Key packages confirmed installed (versions at inspection):
```
amdgpu-dkms 1:6.16.13.30300000-2278356.24.04
amdgpu-core 1:7.2.70200-2278374.24.04
hip-runtime-amd 7.2.26015.70200-43~24.04
hipblas 3.2.0.70200-43~24.04
hipblaslt 1.2.1.70200-43~24.04
hipcc 1.1.1.70200-43~24.04
hipfft 1.0.22.70200-43~24.04
hiprand 3.1.0.70200-43~24.04
hipsolver 3.2.0.70200-43~24.04
hipsparse 4.2.0.70200-43~24.04
```
**Gap:** The exact `amdgpu-install` invocation used to bootstrap the host ROCm install
was not captured (the AMD GPU droplet image likely pre-installs it via cloud-init).
If building on a bare Ubuntu 24.04 node, follow the [official ROCm 7.2 install guide](https://rocm.docs.amd.com/en/docs-7.2.0/deploy/linux/quick_start.html).
### 3. Docker install
Docker CE was installed from the official Docker apt repo:
```
# /etc/apt/sources.list.d/docker.list
deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu noble stable
```
Packages installed:
```
docker-ce 5:29.4.2-2~ubuntu.24.04~noble
docker-ce-cli 5:29.4.2-2~ubuntu.24.04~noble
docker-buildx-plugin 0.33.0-1~ubuntu.24.04~noble
docker-compose-plugin 5.1.3-1~ubuntu.24.04~noble
```
Docker is **systemd-enabled** β starts automatically on reboot.
Standard install steps if needed:
```bash
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| gpg --dearmor -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] \
https://download.docker.com/linux/ubuntu noble stable" \
> /etc/apt/sources.list.d/docker.list
apt-get update
apt-get install -y docker-ce docker-ce-cli docker-compose-plugin
systemctl enable --now docker
```
### 4. Pull and launch vLLM
The full `docker run` reconstructed from live `docker inspect`:
```bash
TOKEN=<your-bearer-token>
HF_CACHE=/root/hf-cache
mkdir -p "$HF_CACHE"
docker run -d --name vllm \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--ipc=host \
--shm-size=16g \
-p 8001:8000 \
-v "${HF_CACHE}:/root/.cache/huggingface" \
-e GLOO_SOCKET_IFNAME=eth0 \
-e VLLM_HOST_IP=127.0.0.1 \
--restart unless-stopped \
vllm/vllm-openai-rocm:v0.17.1 \
--model ibm-granite/granite-4.1-8b \
--host 0.0.0.0 \
--port 8000 \
--api-key "$TOKEN" \
--max-model-len 8192 \
--served-model-name granite-4.1-8b
```
**Observed startup behavior (from logs):**
- Architecture resolved as `GraniteForCausalLM` (vanilla decoder, no hybrid Mamba)
- dtype: `torch.bfloat16`
- tensor_parallel_size: 1, pipeline_parallel_size: 1, data_parallel_size: 1
- prefix caching: enabled, chunked prefill: enabled
- Model load: ~24 s, 16.46 GiB memory
- Graph capture: ~8 s, 0.45 GiB additional
- Total cold init: ~35 s from container start to API ready
- CUDA graph sizes: 51 sizes up to 512 tokens
- First-request ROCm kernel JIT can add 30β50 s; subsequent requests are 30β50Γ faster
**`GLOO_SOCKET_IFNAME=eth0` is required.** Without it gloo fails to bind and the engine
core never initialises. Do not remove this env var.
### 5. Build and launch riprap-models
Build the image from the repo source (do this from your local machine; `deploy_droplet.sh`
handles the tar-stream automatically):
```bash
# On the droplet after source is synced to /workspace/riprap-build:
cd /workspace/riprap-build && \
docker build \
-t riprap-models:latest \
-f services/riprap-models/Dockerfile \
.
```
Full `docker run` reconstructed from live `docker inspect`:
```bash
TOKEN=<your-bearer-token> # same token as vLLM
HF_CACHE=/root/hf-cache
docker run -d --name riprap-models \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--ipc=host \
--shm-size=8g \
-p 7860:7860 \
-v "${HF_CACHE}:/root/.cache/huggingface" \
-e RIPRAP_MODELS_API_KEY="$TOKEN" \
--restart unless-stopped \
riprap-models:latest
```
Entrypoint: `uvicorn main:app --host 0.0.0.0 --port 7860 --log-level info --proxy-headers`
**Key environment variables baked into the image** (not injected at runtime, no override needed):
```
ROCM_PATH=/opt/rocm
LD_LIBRARY_PATH=/opt/rocm/lib:/usr/local/lib:
PYTORCH_ROCM_ARCH=gfx942
AITER_ROCM_ARCH=gfx942;gfx950
MORI_GPU_ARCHS=gfx942;gfx950
HSA_NO_SCRATCH_RECLAIM=1
TOKENIZERS_PARALLELISM=false
SAFETENSORS_FAST_GPU=1
HIP_FORCE_DEV_KERNARG=1
HF_HOME=/root/.cache/huggingface
TRANSFORMERS_CACHE=/root/.cache/huggingface
```
**Python packages confirmed on running container** (at inspection):
| Package | Version |
|---------|---------|
| torch | 2.10.0 (ROCm build) |
| transformers | 4.57.6 |
| terratorch | 1.2.7 |
| torchgeo | 0.9.0 |
| torchvision | 0.24.1+d801a34 |
| torchaudio | 2.9.0+eaa9e4e |
| granite-tsfm | 0.3.6 |
| gliner | 0.2.26 |
| sentence-transformers | 5.4.1 |
| timm | 1.0.25 |
| safetensors | 0.8.0rc0 |
| segmentation_models_pytorch | 0.5.0 |
| pytorch-lightning | 2.6.1 |
| huggingface_hub | 0.36.2 |
> **`safetensors==0.8.0rc0` is a release candidate.** If the Dockerfile build fails on
> a fresh droplet with a pip resolution error on this package, bump it to the nearest
> stable release in `services/riprap-models/requirements-full.txt`.
**test_transform patch:** The v2 datamodule `test_transform` patch was confirmed present
in the running container at `/app/vllm/examples/pooling/plugin/prithvi_geospatial_mae_offline.py`.
**First-request model download:** The HF cache at `/root/hf-cache` is a bind mount that
survives container recreation. On a fresh droplet with an empty cache, the first request
to each specialist triggers a ~12 GB model download. Steady-state requests reuse the
cached weights.
### 6. Firewall
UFW was active at inspection. The relevant rules:
```bash
ufw limit 22/tcp # SSH: rate-limited
ufw allow 80/tcp # Caddy (reverse proxy placeholder)
ufw allow 443 # HTTPS (currently unused)
ufw deny 6601 # Explicit block
ufw deny 50061 # Explicit block
```
UFW **default is allow incoming**, so ports 8001 (vLLM) and 7860 (riprap-models) are
reachable from the public internet without an explicit allow rule. If you want to
restrict access to the HF Space only, add:
```bash
# Allow only HF Space egress IPs (check current HF IP ranges first)
ufw default deny incoming
ufw allow from <hf-space-ip-range> to any port 8001
ufw allow from <hf-space-ip-range> to any port 7860
ufw allow 22/tcp
```
### 7. Startup behavior
**The stack auto-starts on reboot with no manual intervention:**
- `dockerd` is managed by systemd (`systemctl is-enabled docker β enabled`)
- Both `vllm` and `riprap-models` containers have `RestartPolicy: unless-stopped`
- On reboot: systemd starts Docker β Docker restarts both containers automatically
**After a manual `docker stop` (e.g., for maintenance):** The containers will NOT
auto-start because `unless-stopped` respects explicit stops. Restart manually:
```bash
docker start vllm riprap-models
```
**After a full reboot or Docker daemon restart:** Auto-start kicks in β no action needed.
**vLLM cold-start warning:** After any restart, vLLM takes ~35 s to become ready
(`/v1/models` returns 200). ROCm kernel compilation adds another 30β50 s of latency on
the very first inference request. The HF Space will see timeouts during this window.
The `deploy_droplet.sh` healthcheck loop waits up to 90 s for vLLM to become ready.
## Required secrets
The stack uses a single shared bearer token for both services:
| Env var / flag | Container | Set where |
|----------------|-----------|-----------|
| `--api-key <TOKEN>` | `vllm` | Passed in `docker run` command (visible in `docker inspect`) |
| `RIPRAP_MODELS_API_KEY=<TOKEN>` | `riprap-models` | Passed in `docker run -e` flag (visible in `docker inspect`) |
**No `.env` file exists at `/root/.env` or `/etc/riprap*`.** The token is stored only
in the running container configuration. To see the live token without SSHing:
```bash
ssh root@<droplet-ip> "docker inspect riprap-models | python3 -c \
\"import sys,json; c=json.load(sys.stdin)[0]; \
[print(e) for e in c['Config']['Env'] if 'API_KEY' in e]\""
```
**The HF Space must also know the token and the droplet's IP.** Set these Space
variables after every redeploy (new droplet = new IP and new token):
```bash
VLLM_PORT=8001
MODELS_PORT=7860
NEW_IP=<new-droplet-ip>
TOKEN=<new-bearer-token>
huggingface-cli space variables \
lablab-ai-amd-developer-hackathon/riprap-nyc \
RIPRAP_LLM_PRIMARY=vllm \
RIPRAP_LLM_BASE_URL="http://${NEW_IP}:${VLLM_PORT}/v1" \
RIPRAP_LLM_API_KEY="$TOKEN" \
RIPRAP_ML_BACKEND=remote \
RIPRAP_ML_BASE_URL="http://${NEW_IP}:${MODELS_PORT}" \
RIPRAP_ML_API_KEY="$TOKEN"
huggingface-cli space restart lablab-ai-amd-developer-hackathon/riprap-nyc
```
## Health check
Two curl commands that confirm both services are live:
```bash
TOKEN=<your-bearer-token>
IP=134.199.193.99 # replace with new IP after redeploy
# vLLM β should return JSON with granite-4.1-8b in the model list
curl -s -H "Authorization: Bearer $TOKEN" \
"http://${IP}:8001/v1/models" | python3 -m json.tool
# riprap-models β should return {"ok": true, ...}
curl -s "http://${IP}:7860/healthz"
```
For a deeper check run the smoke-test script:
```bash
bash scripts/smoke_test_gpu.sh "$IP" "$TOKEN"
# Want: 4 PASS, 0 FAIL
```
For a full end-to-end check via the HF Space:
```bash
.venv/bin/python scripts/probe_addresses.py \
--base https://lablab-ai-amd-developer-hackathon-riprap-nyc.hf.space
# Want: 5/5 PASS
```
## Gaps in existing scripts
| Missing script | What it needs to do |
|----------------|---------------------|
| `scripts/update_hf_env.sh` | Accept `<ip> <token>` args, run `huggingface-cli space variables` to update `RIPRAP_LLM_BASE_URL`, `RIPRAP_LLM_API_KEY`, `RIPRAP_ML_BASE_URL`, `RIPRAP_ML_API_KEY`, then restart the Space. Called as the last step after a successful `deploy_droplet.sh`. |
| `scripts/redeploy.sh` | Thin orchestrator: generate a fresh token, call `deploy_droplet.sh <ip> <token>`, then call `update_hf_env.sh <ip> <token>`, then run `probe_addresses.py` against the live Space to confirm 5/5. Reduces a 4-step redeploy to one command. |
`save_droplet_image.sh` is complete but only useful while a working droplet is alive.
The bootstrap droplet was destroyed 2026-05-06; this script cannot recover from that.
## Destroy checklist
- [ ] Note the current `RIPRAP_MODELS_API_KEY` / vLLM `--api-key` value (or accept that
you'll generate a fresh one on the next bring-up and update HF Space variables)
- [ ] Confirm the three NYC fine-tune artefacts exist on HF Hub (they do):
`msradam/TerraMind-NYC-Adapters`, `msradam/Prithvi-EO-2.0-NYC-Pluvial`,
`msradam/Granite-TTM-r2-Battery-Surge`
- [ ] Confirm no model weights exist only on the droplet β all are fetched from HF Hub
on first request; the `/root/hf-cache` bind mount does NOT survive droplet deletion
- [ ] Run `bash scripts/smoke_test_gpu.sh <ip> <token>` one final time; record result
- [ ] Run `python scripts/probe_addresses.py` one final time; record result
- [ ] Update HF Space env vars to point at a new droplet OR confirm the Space gracefully
falls back to Ollama (pill will turn amber)
- [ ] `doctl compute droplet delete 569363721` or destroy via DO console
- [ ] Verify HF Space is still serving after destroy:
`curl -sf https://lablab-ai-amd-developer-hackathon-riprap-nyc.hf.space/api/backend`
|