File size: 16,073 Bytes
ab4f0a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
# Droplet Runbook

_Last verified: 2026-05-06 (live introspection of droplet 569363721)_

## Spec

| Field | Value |
|-------|-------|
| Provider | DigitalOcean GPU Droplet (AMD Developer Cloud) |
| Droplet ID | 569363721 |
| Size slug | `gpu-mi300x1-192gb` (from hostname `0.17.1-gpu-mi300x1-192gb-devcloud-atl1`) |
| Region | `atl1` (Atlanta) |
| OS | Ubuntu 24.04.4 LTS |
| Kernel | 6.8.0-106-generic |
| Disk | 697 GiB root, 112 GiB used at inspection |
| RAM | 235 GiB |
| Swap | None |
| GPU | AMD Instinct MI300X VF (gfx942, model 0x74b5) |
| VRAM | 192 GiB (205,822,885,888 bytes) |
| ROCm SMI | 4.0.0+fc0010cf6a |
| ROCm lib | 7.8.0 (installed via `repo.radeon.com/rocm/apt/7.2`) |
| Docker | CE 29.4.2 (from official `download.docker.com/linux/ubuntu`) |

## Services

| Container | Image | Host Port | Container Port | Purpose |
|-----------|-------|-----------|----------------|---------|
| `vllm` | `vllm/vllm-openai-rocm:v0.17.1` | 8001 | 8000 | OpenAI-compatible LLM API (Granite 4.1 8B) |
| `riprap-models` | `riprap-models:latest` (local build) | 7860 | 7860 | GPU-specialist FastAPI service (Prithvi, TerraMind, GLiNER, Granite Embed, TTM) |

Both have `--restart unless-stopped`. Docker is systemd-enabled, so the full stack
auto-starts on reboot with no manual intervention.

A **Caddy** process runs natively (port 80, systemd service) configured to reverse-proxy
to `localhost:8888`. Nothing was listening on 8888 at inspection time β€” this appears to
be a leftover placeholder, not load-bearing for Riprap.

## Existing provisioning scripts

| Script | What it does | Status |
|--------|--------------|--------|
| `scripts/deploy_droplet.sh` | Full bring-up: SSH verify, pull vLLM image, tar-stream + build riprap-models, start both containers, healthcheck. Idempotent β€” removes and recreates containers on re-run. | **Complete.** The canonical bring-up script. |
| `scripts/smoke_test_gpu.sh` | 4-check smoke: vLLM /v1/models, vLLM /v1/chat/completions, riprap-models /healthz, riprap-models /v1/granite-embed, /v1/gliner-extract. | **Complete.** Run after deploy to confirm the stack is live. |
| `scripts/save_droplet_image.sh` | Commits the running container, saves + compresses to a local tarball via scp. Useful as a fallback if the public-base Dockerfile rebuild fails. | Complete but **moot** once the bootstrap droplet is destroyed β€” requires a live droplet to extract from. |
| `scripts/probe_addresses.py` | End-to-end test against `/api/agent/stream` on the HF Space. 5/5 must pass before merging. | Not a droplet-setup script; it tests the full system end-to-end. |

**Gap:** No `update_hf_env.sh` exists. Updating HF Space env vars after a redeploy (new IP
or new token) is a manual `huggingface-cli space variables` command β€” see Β§Required
secrets below. This would be a good script to add.

**Gap:** No `redeploy.sh` wrapper exists. `deploy_droplet.sh` handles bring-up on a fresh
droplet but does not handle the HF Space variable update or the post-deploy probe run.
A `redeploy.sh` that chains `deploy_droplet.sh β†’ huggingface-cli variables update β†’
probe_addresses.py` would complete the loop.

## Recreation steps

### 1. Provision the droplet

Use the DigitalOcean console or `doctl`. The exact size slug used was
`gpu-mi300x1-192gb`; pick `atl1` for the AMD Developer Cloud node type.

```bash
doctl compute droplet create riprap-gpu \
  --size gpu-mi300x1-192gb \
  --region atl1 \
  --image ubuntu-24-04-x64 \
  --ssh-keys <your-key-id>
```

Confirm `/dev/kfd` and `/dev/dri` are present before continuing:

```bash
ssh root@<new-ip> "ls /dev/kfd /dev/dri"
```

> **Note:** The AMD Developer Cloud GPU droplet image pre-installs ROCm and Docker.
> Steps 2–3 below document what was observed on the live system. On a fresh image from
> DigitalOcean's AMD GPU catalog they may already be satisfied β€” verify before running.

### 2. ROCm install

ROCm 7.2 was installed via the AMD repo. The following sources were present in
`/etc/apt/sources.list.d/`:

```
# /etc/apt/sources.list.d/rocm.list
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.2 noble main

# /etc/apt/sources.list.d/amdgpu.list
deb [arch=amd64,i386 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/30.30/ubuntu noble main

# /etc/apt/sources.list.d/device-metrics-exporter.list
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/device-metrics-exporter/apt/1.4.0 noble main
```

Key packages confirmed installed (versions at inspection):

```
amdgpu-dkms          1:6.16.13.30300000-2278356.24.04
amdgpu-core          1:7.2.70200-2278374.24.04
hip-runtime-amd      7.2.26015.70200-43~24.04
hipblas              3.2.0.70200-43~24.04
hipblaslt            1.2.1.70200-43~24.04
hipcc                1.1.1.70200-43~24.04
hipfft               1.0.22.70200-43~24.04
hiprand              3.1.0.70200-43~24.04
hipsolver            3.2.0.70200-43~24.04
hipsparse            4.2.0.70200-43~24.04
```

**Gap:** The exact `amdgpu-install` invocation used to bootstrap the host ROCm install
was not captured (the AMD GPU droplet image likely pre-installs it via cloud-init).
If building on a bare Ubuntu 24.04 node, follow the [official ROCm 7.2 install guide](https://rocm.docs.amd.com/en/docs-7.2.0/deploy/linux/quick_start.html).

### 3. Docker install

Docker CE was installed from the official Docker apt repo:

```
# /etc/apt/sources.list.d/docker.list
deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu noble stable
```

Packages installed:

```
docker-ce              5:29.4.2-2~ubuntu.24.04~noble
docker-ce-cli          5:29.4.2-2~ubuntu.24.04~noble
docker-buildx-plugin   0.33.0-1~ubuntu.24.04~noble
docker-compose-plugin  5.1.3-1~ubuntu.24.04~noble
```

Docker is **systemd-enabled** β€” starts automatically on reboot.

Standard install steps if needed:

```bash
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
  | gpg --dearmor -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] \
  https://download.docker.com/linux/ubuntu noble stable" \
  > /etc/apt/sources.list.d/docker.list
apt-get update
apt-get install -y docker-ce docker-ce-cli docker-compose-plugin
systemctl enable --now docker
```

### 4. Pull and launch vLLM

The full `docker run` reconstructed from live `docker inspect`:

```bash
TOKEN=<your-bearer-token>
HF_CACHE=/root/hf-cache

mkdir -p "$HF_CACHE"

docker run -d --name vllm \
  --device=/dev/kfd \
  --device=/dev/dri \
  --group-add video \
  --ipc=host \
  --shm-size=16g \
  -p 8001:8000 \
  -v "${HF_CACHE}:/root/.cache/huggingface" \
  -e GLOO_SOCKET_IFNAME=eth0 \
  -e VLLM_HOST_IP=127.0.0.1 \
  --restart unless-stopped \
  vllm/vllm-openai-rocm:v0.17.1 \
  --model ibm-granite/granite-4.1-8b \
  --host 0.0.0.0 \
  --port 8000 \
  --api-key "$TOKEN" \
  --max-model-len 8192 \
  --served-model-name granite-4.1-8b
```

**Observed startup behavior (from logs):**
- Architecture resolved as `GraniteForCausalLM` (vanilla decoder, no hybrid Mamba)
- dtype: `torch.bfloat16`
- tensor_parallel_size: 1, pipeline_parallel_size: 1, data_parallel_size: 1
- prefix caching: enabled, chunked prefill: enabled
- Model load: ~24 s, 16.46 GiB memory
- Graph capture: ~8 s, 0.45 GiB additional
- Total cold init: ~35 s from container start to API ready
- CUDA graph sizes: 51 sizes up to 512 tokens
- First-request ROCm kernel JIT can add 30–50 s; subsequent requests are 30–50Γ— faster

**`GLOO_SOCKET_IFNAME=eth0` is required.** Without it gloo fails to bind and the engine
core never initialises. Do not remove this env var.

### 5. Build and launch riprap-models

Build the image from the repo source (do this from your local machine; `deploy_droplet.sh`
handles the tar-stream automatically):

```bash
# On the droplet after source is synced to /workspace/riprap-build:
cd /workspace/riprap-build && \
  docker build \
    -t riprap-models:latest \
    -f services/riprap-models/Dockerfile \
    .
```

Full `docker run` reconstructed from live `docker inspect`:

```bash
TOKEN=<your-bearer-token>   # same token as vLLM
HF_CACHE=/root/hf-cache

docker run -d --name riprap-models \
  --device=/dev/kfd \
  --device=/dev/dri \
  --group-add video \
  --ipc=host \
  --shm-size=8g \
  -p 7860:7860 \
  -v "${HF_CACHE}:/root/.cache/huggingface" \
  -e RIPRAP_MODELS_API_KEY="$TOKEN" \
  --restart unless-stopped \
  riprap-models:latest
```

Entrypoint: `uvicorn main:app --host 0.0.0.0 --port 7860 --log-level info --proxy-headers`

**Key environment variables baked into the image** (not injected at runtime, no override needed):

```
ROCM_PATH=/opt/rocm
LD_LIBRARY_PATH=/opt/rocm/lib:/usr/local/lib:
PYTORCH_ROCM_ARCH=gfx942
AITER_ROCM_ARCH=gfx942;gfx950
MORI_GPU_ARCHS=gfx942;gfx950
HSA_NO_SCRATCH_RECLAIM=1
TOKENIZERS_PARALLELISM=false
SAFETENSORS_FAST_GPU=1
HIP_FORCE_DEV_KERNARG=1
HF_HOME=/root/.cache/huggingface
TRANSFORMERS_CACHE=/root/.cache/huggingface
```

**Python packages confirmed on running container** (at inspection):

| Package | Version |
|---------|---------|
| torch | 2.10.0 (ROCm build) |
| transformers | 4.57.6 |
| terratorch | 1.2.7 |
| torchgeo | 0.9.0 |
| torchvision | 0.24.1+d801a34 |
| torchaudio | 2.9.0+eaa9e4e |
| granite-tsfm | 0.3.6 |
| gliner | 0.2.26 |
| sentence-transformers | 5.4.1 |
| timm | 1.0.25 |
| safetensors | 0.8.0rc0 |
| segmentation_models_pytorch | 0.5.0 |
| pytorch-lightning | 2.6.1 |
| huggingface_hub | 0.36.2 |

> **`safetensors==0.8.0rc0` is a release candidate.** If the Dockerfile build fails on
> a fresh droplet with a pip resolution error on this package, bump it to the nearest
> stable release in `services/riprap-models/requirements-full.txt`.

**test_transform patch:** The v2 datamodule `test_transform` patch was confirmed present
in the running container at `/app/vllm/examples/pooling/plugin/prithvi_geospatial_mae_offline.py`.

**First-request model download:** The HF cache at `/root/hf-cache` is a bind mount that
survives container recreation. On a fresh droplet with an empty cache, the first request
to each specialist triggers a ~12 GB model download. Steady-state requests reuse the
cached weights.

### 6. Firewall

UFW was active at inspection. The relevant rules:

```bash
ufw limit 22/tcp      # SSH: rate-limited
ufw allow 80/tcp      # Caddy (reverse proxy placeholder)
ufw allow 443         # HTTPS (currently unused)
ufw deny 6601         # Explicit block
ufw deny 50061        # Explicit block
```

UFW **default is allow incoming**, so ports 8001 (vLLM) and 7860 (riprap-models) are
reachable from the public internet without an explicit allow rule. If you want to
restrict access to the HF Space only, add:

```bash
# Allow only HF Space egress IPs (check current HF IP ranges first)
ufw default deny incoming
ufw allow from <hf-space-ip-range> to any port 8001
ufw allow from <hf-space-ip-range> to any port 7860
ufw allow 22/tcp
```

### 7. Startup behavior

**The stack auto-starts on reboot with no manual intervention:**

- `dockerd` is managed by systemd (`systemctl is-enabled docker β†’ enabled`)
- Both `vllm` and `riprap-models` containers have `RestartPolicy: unless-stopped`
- On reboot: systemd starts Docker β†’ Docker restarts both containers automatically

**After a manual `docker stop` (e.g., for maintenance):** The containers will NOT
auto-start because `unless-stopped` respects explicit stops. Restart manually:

```bash
docker start vllm riprap-models
```

**After a full reboot or Docker daemon restart:** Auto-start kicks in β€” no action needed.

**vLLM cold-start warning:** After any restart, vLLM takes ~35 s to become ready
(`/v1/models` returns 200). ROCm kernel compilation adds another 30–50 s of latency on
the very first inference request. The HF Space will see timeouts during this window.
The `deploy_droplet.sh` healthcheck loop waits up to 90 s for vLLM to become ready.

## Required secrets

The stack uses a single shared bearer token for both services:

| Env var / flag | Container | Set where |
|----------------|-----------|-----------|
| `--api-key <TOKEN>` | `vllm` | Passed in `docker run` command (visible in `docker inspect`) |
| `RIPRAP_MODELS_API_KEY=<TOKEN>` | `riprap-models` | Passed in `docker run -e` flag (visible in `docker inspect`) |

**No `.env` file exists at `/root/.env` or `/etc/riprap*`.** The token is stored only
in the running container configuration. To see the live token without SSHing:

```bash
ssh root@<droplet-ip> "docker inspect riprap-models | python3 -c \
  \"import sys,json; c=json.load(sys.stdin)[0]; \
  [print(e) for e in c['Config']['Env'] if 'API_KEY' in e]\""
```

**The HF Space must also know the token and the droplet's IP.** Set these Space
variables after every redeploy (new droplet = new IP and new token):

```bash
VLLM_PORT=8001
MODELS_PORT=7860
NEW_IP=<new-droplet-ip>
TOKEN=<new-bearer-token>

huggingface-cli space variables \
  lablab-ai-amd-developer-hackathon/riprap-nyc \
  RIPRAP_LLM_PRIMARY=vllm \
  RIPRAP_LLM_BASE_URL="http://${NEW_IP}:${VLLM_PORT}/v1" \
  RIPRAP_LLM_API_KEY="$TOKEN" \
  RIPRAP_ML_BACKEND=remote \
  RIPRAP_ML_BASE_URL="http://${NEW_IP}:${MODELS_PORT}" \
  RIPRAP_ML_API_KEY="$TOKEN"

huggingface-cli space restart lablab-ai-amd-developer-hackathon/riprap-nyc
```

## Health check

Two curl commands that confirm both services are live:

```bash
TOKEN=<your-bearer-token>
IP=134.199.193.99   # replace with new IP after redeploy

# vLLM β€” should return JSON with granite-4.1-8b in the model list
curl -s -H "Authorization: Bearer $TOKEN" \
  "http://${IP}:8001/v1/models" | python3 -m json.tool

# riprap-models β€” should return {"ok": true, ...}
curl -s "http://${IP}:7860/healthz"
```

For a deeper check run the smoke-test script:

```bash
bash scripts/smoke_test_gpu.sh "$IP" "$TOKEN"
# Want: 4 PASS, 0 FAIL
```

For a full end-to-end check via the HF Space:

```bash
.venv/bin/python scripts/probe_addresses.py \
  --base https://lablab-ai-amd-developer-hackathon-riprap-nyc.hf.space
# Want: 5/5 PASS
```

## Gaps in existing scripts

| Missing script | What it needs to do |
|----------------|---------------------|
| `scripts/update_hf_env.sh` | Accept `<ip> <token>` args, run `huggingface-cli space variables` to update `RIPRAP_LLM_BASE_URL`, `RIPRAP_LLM_API_KEY`, `RIPRAP_ML_BASE_URL`, `RIPRAP_ML_API_KEY`, then restart the Space. Called as the last step after a successful `deploy_droplet.sh`. |
| `scripts/redeploy.sh` | Thin orchestrator: generate a fresh token, call `deploy_droplet.sh <ip> <token>`, then call `update_hf_env.sh <ip> <token>`, then run `probe_addresses.py` against the live Space to confirm 5/5. Reduces a 4-step redeploy to one command. |

`save_droplet_image.sh` is complete but only useful while a working droplet is alive.
The bootstrap droplet was destroyed 2026-05-06; this script cannot recover from that.

## Destroy checklist

- [ ] Note the current `RIPRAP_MODELS_API_KEY` / vLLM `--api-key` value (or accept that
      you'll generate a fresh one on the next bring-up and update HF Space variables)
- [ ] Confirm the three NYC fine-tune artefacts exist on HF Hub (they do):
      `msradam/TerraMind-NYC-Adapters`, `msradam/Prithvi-EO-2.0-NYC-Pluvial`,
      `msradam/Granite-TTM-r2-Battery-Surge`
- [ ] Confirm no model weights exist only on the droplet β€” all are fetched from HF Hub
      on first request; the `/root/hf-cache` bind mount does NOT survive droplet deletion
- [ ] Run `bash scripts/smoke_test_gpu.sh <ip> <token>` one final time; record result
- [ ] Run `python scripts/probe_addresses.py` one final time; record result
- [ ] Update HF Space env vars to point at a new droplet OR confirm the Space gracefully
      falls back to Ollama (pill will turn amber)
- [ ] `doctl compute droplet delete 569363721` or destroy via DO console
- [ ] Verify HF Space is still serving after destroy:
      `curl -sf https://lablab-ai-amd-developer-hackathon-riprap-nyc.hf.space/api/backend`