Dockerfile: CPU-tuned env for HF Spaces cpu-basic
Browse filescpu-basic doesn't have disk for both 3b + 8b alongside torch + EO
deps. Alias the 8b reconciler calls to 3b via RIPRAP_OLLAMA_8B_TAG so
the polished planner/Mellea/intents pipeline runs end-to-end.
RIPRAP_LLM_PRIMARY=ollama β explicit (default)
RIPRAP_OLLAMA_8B_TAG=granite4.1:3b β 8b alias remap
RIPRAP_MELLEA_MAX_ATTEMPTS=2 β fewer rerolls on slow CPU
OLLAMA_NUM_PARALLEL=1 β don't load a 2nd 3b copy
Reconciler quality drops vs 8b; the AMD MI300X vLLM path
(RIPRAP_LLM_PRIMARY=vllm) is the speed/quality lever for the demo.
- Dockerfile +15 -4
Dockerfile
CHANGED
|
@@ -1,12 +1,20 @@
|
|
| 1 |
# Riprap β Hugging Face Spaces (Docker SDK) deployment.
|
| 2 |
#
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
# Bakes:
|
| 4 |
# - Python 3.12 + pip deps (~2.5 GB once torch is in)
|
| 5 |
-
# - Ollama + granite4.1:3b model (~2 GB)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
# - All pre-computed fixtures in data/ + corpus/
|
| 7 |
#
|
| 8 |
# Runtime:
|
| 9 |
-
# - Ollama daemon serves Granite 4.1
|
| 10 |
# - Granite Embedding 278M auto-downloads via sentence-transformers
|
| 11 |
# on first FastAPI startup (~280 MB) β cached to /home/user/.cache
|
| 12 |
# - uvicorn FastAPI on port 7860 (HF default)
|
|
@@ -29,8 +37,11 @@ ENV HOME=/home/user \
|
|
| 29 |
PYTHONUNBUFFERED=1 \
|
| 30 |
HF_HOME=/home/user/.cache/huggingface \
|
| 31 |
OLLAMA_HOST=127.0.0.1:11434 \
|
| 32 |
-
OLLAMA_NUM_PARALLEL=
|
| 33 |
-
OLLAMA_KEEP_ALIVE=24h
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
# Install Ollama (single-binary install)
|
| 36 |
RUN curl -fsSL https://ollama.com/install.sh | sh
|
|
|
|
| 1 |
# Riprap β Hugging Face Spaces (Docker SDK) deployment.
|
| 2 |
#
|
| 3 |
+
# CPU-tuned variant for HF Spaces cpu-basic (free tier). The
|
| 4 |
+
# nvidia-t4-small / MI300X variants live alongside as build args
|
| 5 |
+
# to switch when the Space is upgraded.
|
| 6 |
+
#
|
| 7 |
# Bakes:
|
| 8 |
# - Python 3.12 + pip deps (~2.5 GB once torch is in)
|
| 9 |
+
# - Ollama + granite4.1:3b model (~2 GB) β 3b only on cpu-basic.
|
| 10 |
+
# RIPRAP_OLLAMA_8B_TAG=granite4.1:3b aliases the 8b reconciler
|
| 11 |
+
# calls to 3b so the polished UI runs end-to-end without 8b's
|
| 12 |
+
# ~5 GB image cost. Quality drops vs 8b; speed lever is the
|
| 13 |
+
# vLLM-on-AMD-MI300X demo path (RIPRAP_LLM_PRIMARY=vllm).
|
| 14 |
# - All pre-computed fixtures in data/ + corpus/
|
| 15 |
#
|
| 16 |
# Runtime:
|
| 17 |
+
# - Ollama daemon serves Granite 4.1:3b
|
| 18 |
# - Granite Embedding 278M auto-downloads via sentence-transformers
|
| 19 |
# on first FastAPI startup (~280 MB) β cached to /home/user/.cache
|
| 20 |
# - uvicorn FastAPI on port 7860 (HF default)
|
|
|
|
| 37 |
PYTHONUNBUFFERED=1 \
|
| 38 |
HF_HOME=/home/user/.cache/huggingface \
|
| 39 |
OLLAMA_HOST=127.0.0.1:11434 \
|
| 40 |
+
OLLAMA_NUM_PARALLEL=1 \
|
| 41 |
+
OLLAMA_KEEP_ALIVE=24h \
|
| 42 |
+
RIPRAP_LLM_PRIMARY=ollama \
|
| 43 |
+
RIPRAP_OLLAMA_8B_TAG=granite4.1:3b \
|
| 44 |
+
RIPRAP_MELLEA_MAX_ATTEMPTS=2
|
| 45 |
|
| 46 |
# Install Ollama (single-binary install)
|
| 47 |
RUN curl -fsSL https://ollama.com/install.sh | sh
|