seriffic commited on
Commit
43f0938
Β·
1 Parent(s): e8a6c67

Dockerfile: CPU-tuned env for HF Spaces cpu-basic

Browse files

cpu-basic doesn't have disk for both 3b + 8b alongside torch + EO
deps. Alias the 8b reconciler calls to 3b via RIPRAP_OLLAMA_8B_TAG so
the polished planner/Mellea/intents pipeline runs end-to-end.

RIPRAP_LLM_PRIMARY=ollama β€” explicit (default)
RIPRAP_OLLAMA_8B_TAG=granite4.1:3b β€” 8b alias remap
RIPRAP_MELLEA_MAX_ATTEMPTS=2 β€” fewer rerolls on slow CPU
OLLAMA_NUM_PARALLEL=1 β€” don't load a 2nd 3b copy

Reconciler quality drops vs 8b; the AMD MI300X vLLM path
(RIPRAP_LLM_PRIMARY=vllm) is the speed/quality lever for the demo.

Files changed (1) hide show
  1. Dockerfile +15 -4
Dockerfile CHANGED
@@ -1,12 +1,20 @@
1
  # Riprap β€” Hugging Face Spaces (Docker SDK) deployment.
2
  #
 
 
 
 
3
  # Bakes:
4
  # - Python 3.12 + pip deps (~2.5 GB once torch is in)
5
- # - Ollama + granite4.1:3b model (~2 GB)
 
 
 
 
6
  # - All pre-computed fixtures in data/ + corpus/
7
  #
8
  # Runtime:
9
- # - Ollama daemon serves Granite 4.1
10
  # - Granite Embedding 278M auto-downloads via sentence-transformers
11
  # on first FastAPI startup (~280 MB) β€” cached to /home/user/.cache
12
  # - uvicorn FastAPI on port 7860 (HF default)
@@ -29,8 +37,11 @@ ENV HOME=/home/user \
29
  PYTHONUNBUFFERED=1 \
30
  HF_HOME=/home/user/.cache/huggingface \
31
  OLLAMA_HOST=127.0.0.1:11434 \
32
- OLLAMA_NUM_PARALLEL=2 \
33
- OLLAMA_KEEP_ALIVE=24h
 
 
 
34
 
35
  # Install Ollama (single-binary install)
36
  RUN curl -fsSL https://ollama.com/install.sh | sh
 
1
  # Riprap β€” Hugging Face Spaces (Docker SDK) deployment.
2
  #
3
+ # CPU-tuned variant for HF Spaces cpu-basic (free tier). The
4
+ # nvidia-t4-small / MI300X variants live alongside as build args
5
+ # to switch when the Space is upgraded.
6
+ #
7
  # Bakes:
8
  # - Python 3.12 + pip deps (~2.5 GB once torch is in)
9
+ # - Ollama + granite4.1:3b model (~2 GB) β€” 3b only on cpu-basic.
10
+ # RIPRAP_OLLAMA_8B_TAG=granite4.1:3b aliases the 8b reconciler
11
+ # calls to 3b so the polished UI runs end-to-end without 8b's
12
+ # ~5 GB image cost. Quality drops vs 8b; speed lever is the
13
+ # vLLM-on-AMD-MI300X demo path (RIPRAP_LLM_PRIMARY=vllm).
14
  # - All pre-computed fixtures in data/ + corpus/
15
  #
16
  # Runtime:
17
+ # - Ollama daemon serves Granite 4.1:3b
18
  # - Granite Embedding 278M auto-downloads via sentence-transformers
19
  # on first FastAPI startup (~280 MB) β€” cached to /home/user/.cache
20
  # - uvicorn FastAPI on port 7860 (HF default)
 
37
  PYTHONUNBUFFERED=1 \
38
  HF_HOME=/home/user/.cache/huggingface \
39
  OLLAMA_HOST=127.0.0.1:11434 \
40
+ OLLAMA_NUM_PARALLEL=1 \
41
+ OLLAMA_KEEP_ALIVE=24h \
42
+ RIPRAP_LLM_PRIMARY=ollama \
43
+ RIPRAP_OLLAMA_8B_TAG=granite4.1:3b \
44
+ RIPRAP_MELLEA_MAX_ATTEMPTS=2
45
 
46
  # Install Ollama (single-binary install)
47
  RUN curl -fsSL https://ollama.com/install.sh | sh