shwetangisingh commited on
Commit
4a7c575
Β·
1 Parent(s): 0e19ba2

add Dockerfile + HF Space frontmatter for hosted deploy

Browse files

Multi-stage Dockerfile: Node 22 + pnpm builds the frontend in stage 1,
Python 3.12 + CPU-only torch + sentence-transformers serves it via
FastAPI in stage 2. The backend serves the built dist/ as static files,
so it's one container, one process, one port.

requirements-docker.txt pins the PyTorch CPU wheel index so the build
doesn't pull ~2GB of unusable CUDA wheels on HF Spaces' free CPU instance.
The base requirements.txt stays platform-neutral for local conda dev.

README gets HF Space YAML frontmatter (sdk: docker, app_port: 7860) and
a hosting section walking through both local docker run and the HF push.

Files changed (5) hide show
  1. .dockerignore +43 -0
  2. Dockerfile +72 -0
  3. README.md +50 -0
  4. requirements-docker.txt +12 -0
  5. requirements.txt +4 -0
.dockerignore ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .git
2
+ .gitignore
3
+ .github
4
+ .vscode
5
+ .idea
6
+ .claude
7
+ .code-review-graph
8
+ .ruff_cache
9
+ .pre-commit-config.yaml
10
+ ruff.toml
11
+
12
+ # Keep build deterministic β€” rebuild indexes inside the container
13
+ data/vector_store/
14
+ data/pick_index/
15
+ data/faiss_store/
16
+
17
+ # Logs and ephemeral state
18
+ logs/
19
+ mlflow.db
20
+ mlruns/
21
+ *.csv
22
+
23
+ # Local dev artefacts
24
+ **/__pycache__/
25
+ **/*.pyc
26
+ **/*.pyo
27
+ **/.pytest_cache
28
+ **/.mypy_cache
29
+
30
+ # Frontend build artefacts (rebuilt in Docker stage 1)
31
+ frontend/node_modules/
32
+ frontend/dist/
33
+
34
+ # Misc
35
+ .DS_Store
36
+ *.swp
37
+ .env
38
+ ProjectDetails.pdf
39
+ docs/
40
+
41
+ README.md
42
+ lessons.md
43
+ references.md
Dockerfile ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ── Stage 1: build the React frontend ────────────────────────────────────────
2
+ FROM node:22-slim AS frontend
3
+
4
+ WORKDIR /app/frontend
5
+
6
+ # pnpm via corepack (ships with Node 22)
7
+ RUN corepack enable
8
+
9
+ COPY frontend/package.json frontend/pnpm-lock.yaml ./
10
+ RUN pnpm install --frozen-lockfile
11
+
12
+ COPY frontend/ ./
13
+ RUN pnpm build
14
+
15
+ # ── Stage 2: Python runtime ──────────────────────────────────────────────────
16
+ FROM python:3.12-slim
17
+
18
+ # HF_HOME points at a writable cache dir for transformers/sentence-transformers.
19
+ # On HF Spaces the default $HOME is read-only at runtime, so we explicitly
20
+ # steer the model cache somewhere writable.
21
+ ENV PYTHONDONTWRITEBYTECODE=1 \
22
+ PYTHONUNBUFFERED=1 \
23
+ PIP_NO_CACHE_DIR=1 \
24
+ HF_HOME=/tmp/hf_cache \
25
+ XDG_CACHE_HOME=/tmp/.cache
26
+
27
+ WORKDIR /app
28
+
29
+ # System deps for torch + sentence-transformers (most are already in slim).
30
+ RUN apt-get update \
31
+ && apt-get install -y --no-install-recommends \
32
+ build-essential \
33
+ curl \
34
+ && rm -rf /var/lib/apt/lists/*
35
+
36
+ # Install Python deps via a Docker-specific requirements file that pins torch
37
+ # to the CPU-only wheel index. The base requirements.txt stays platform-neutral
38
+ # so local conda dev (./setup.sh) keeps using whatever torch flavor your OS
39
+ # wants (MPS on macOS, CUDA on Linux+GPU); this image deliberately uses CPU
40
+ # only because HF Spaces' free CPU instance can't use CUDA anyway.
41
+ COPY requirements.txt requirements-docker.txt ./
42
+ RUN pip install --upgrade pip \
43
+ && pip install --retries 5 --timeout 120 -r requirements-docker.txt
44
+
45
+ # Copy the backend + persona source data.
46
+ COPY backend/ ./backend/
47
+ COPY data/memories/ ./data/memories/
48
+ COPY data/users.json ./data/users.json
49
+ COPY data/generate_users.py ./data/generate_users.py
50
+
51
+ # Build per-user vector indexes inside the image (downloads BGE on first run).
52
+ # This bakes the indexes into the image so first-request latency is just the
53
+ # model warm-up, not a fresh BGE encode of every persona.
54
+ RUN python -m backend.retrieval.vector_store
55
+
56
+ # Pull the built static frontend from stage 1.
57
+ COPY --from=frontend /app/frontend/dist ./frontend/dist
58
+
59
+ # Pre-create writable directories. HF Spaces filesystem is read-only outside
60
+ # /tmp at runtime, so logs default to /tmp; locally you can override LOGS_DIR
61
+ # via env to anything mounted/writable.
62
+ RUN mkdir -p /tmp/logs /tmp/hf_cache /tmp/.cache /tmp/pick_index \
63
+ && chmod -R 777 /tmp/logs /tmp/hf_cache /tmp/.cache /tmp/pick_index
64
+ ENV LOGS_DIR=/tmp/logs
65
+
66
+ # HF Spaces expects 7860 by default; respects $PORT for local docker run.
67
+ ENV PORT=7860
68
+ EXPOSE 7860
69
+
70
+ # sh -c expands $PORT at runtime so the same image runs both on HF (port 7860,
71
+ # unset PORT or PORT=7860) and locally (e.g. `docker run -e PORT=8000 ...`).
72
+ CMD sh -c "uvicorn backend.api.main:app --host 0.0.0.0 --port ${PORT:-7860}"
README.md CHANGED
@@ -1,3 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
1
  # Multimodal AAC Chatbot
2
 
3
  A chatbot that **speaks as an AAC user, not to them.** You pick a persona β€” fourteen are shipped, anchored in real memoirs and canonical fiction β€” and the partner talks to them. The bot replies in that person's voice, using their memories, and adjusts what it says based on what the webcam sees: facial expression, hand gestures, where they're looking, and letters they trace in the air.
@@ -14,6 +25,7 @@ It's a training-free agentic RAG pipeline β€” a plain Python function chain with
14
  - [Setup](#setup)
15
  - [Configuration](#configuration)
16
  - [Running the Project](#running-the-project)
 
17
  - [Project Structure](#project-structure)
18
  - [Personas](#personas)
19
  - [Team](#team)
@@ -318,6 +330,44 @@ Output covers latency quantiles + SLO pass rate, faithfulness (groundedness / ha
318
 
319
  ---
320
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
321
  ## Project Structure
322
 
323
  ```
 
1
+ ---
2
+ title: Multimodal AAC Chatbot
3
+ emoji: 🌸
4
+ colorFrom: pink
5
+ colorTo: indigo
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ license: other
10
+ ---
11
+
12
  # Multimodal AAC Chatbot
13
 
14
  A chatbot that **speaks as an AAC user, not to them.** You pick a persona β€” fourteen are shipped, anchored in real memoirs and canonical fiction β€” and the partner talks to them. The bot replies in that person's voice, using their memories, and adjusts what it says based on what the webcam sees: facial expression, hand gestures, where they're looking, and letters they trace in the air.
 
25
  - [Setup](#setup)
26
  - [Configuration](#configuration)
27
  - [Running the Project](#running-the-project)
28
+ - [Hosting](#hosting)
29
  - [Project Structure](#project-structure)
30
  - [Personas](#personas)
31
  - [Team](#team)
 
330
 
331
  ---
332
 
333
+ ## Hosting
334
+
335
+ The project ships with a single [Dockerfile](Dockerfile) that builds the React frontend in stage 1 (Node 22 + pnpm) and runs the FastAPI backend in stage 2 (Python 3.12 + torch + sentence-transformers). The backend serves the built `frontend/dist/` as static files, so it's one container, one process, one port.
336
+
337
+ The same image runs identically in two places.
338
+
339
+ ### Locally (for development that mirrors production)
340
+
341
+ ```bash
342
+ docker build -t aac-chatbot .
343
+ docker run --rm -p 8000:8000 -e PORT=8000 --env-file .env aac-chatbot
344
+ # β†’ http://localhost:8000
345
+ ```
346
+
347
+ The `--env-file .env` injects your Ollama Cloud key + endpoints (same `.env` you use for `./run.sh`). Conda + `./run.sh` is still the fastest dev loop because it hot-reloads; the docker path is for when you want byte-identical-to-production behaviour.
348
+
349
+ ### On Hugging Face Spaces (public URL for graders)
350
+
351
+ The repo doubles as an HF Space β€” `README.md` carries the YAML frontmatter HF needs (`sdk: docker`, `app_port: 7860`).
352
+
353
+ 1. Create a new Space on huggingface.co (Docker SDK, public).
354
+ 2. Add this repo as a remote:
355
+ ```bash
356
+ git remote add space https://huggingface.co/spaces/<your-username>/aac-chatbot
357
+ git push space main
358
+ ```
359
+ 3. In the Space's *Settings β†’ Variables and secrets*, add the LLM-tier secrets (don't commit them):
360
+ - `PRIMARY_API_KEY`, `PRIMARY_BASE_URL`, `PRIMARY_MODEL`
361
+ - `FALLBACK_API_KEY`, `FALLBACK_BASE_URL`, `FALLBACK_MODEL`
362
+ - `INK_VISION_API_KEY`, `INK_VISION_BASE_URL`, `INK_VISION_MODEL`
363
+ 4. The Space rebuilds the Dockerfile on every push. First build takes ~5-8 min (downloads BGE + builds vector indexes for all personas); subsequent builds reuse Docker layer cache and finish in 2-3 min.
364
+
365
+ The deployed instance won't persist `logs/` or `data/pick_index/` across container restarts (HF Spaces filesystem is read-only outside `/tmp`). For the writeup, your local logs are the source of truth β€” the Space is just a click-around demo for graders.
366
+
367
+ **Webcam note.** `getUserMedia` requires HTTPS, which both HF Spaces and `localhost` provide. Random IP addresses don't, so don't try to demo from a LAN IP without a tunnel.
368
+
369
+ ---
370
+
371
  ## Project Structure
372
 
373
  ```
requirements-docker.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Docker / HF Space install β€” uses CPU-only torch to avoid pulling ~2GB of
2
+ # CUDA wheels we can't use on HF Spaces' free CPU instance.
3
+ #
4
+ # The PyTorch CPU index serves wheels with a `+cpu` local-version tag.
5
+ # Listing it FIRST (via --index-url) makes pip prefer those wheels for torch;
6
+ # the PyPI fallback covers everything else.
7
+ --index-url https://download.pytorch.org/whl/cpu
8
+ --extra-index-url https://pypi.org/simple
9
+
10
+ # Everything from the standard requirements file. Torch will resolve to the
11
+ # CPU wheel from the index above; the rest from PyPI.
12
+ -r requirements.txt
requirements.txt CHANGED
@@ -3,6 +3,10 @@ openai>=1.0 # talks to Ollama Cloud over OpenAI-compatible HTTP
3
 
4
  # ── Retrieval ──────────────────────────────────────────────────────────────────
5
  sentence-transformers>=3.0
 
 
 
 
6
  torch>=2.0
7
  transformers>=4.40
8
  numpy>=1.24
 
3
 
4
  # ── Retrieval ──────────────────────────────────────────────────────────────────
5
  sentence-transformers>=3.0
6
+ # torch is installed separately in the Dockerfile from the CPU-only wheel
7
+ # index. For local conda dev (./setup.sh) we use the default platform torch
8
+ # (which on macOS gives MPS acceleration). The constraint is satisfied either
9
+ # way; the comment is here so a fresh reader doesn't try to pin it.
10
  torch>=2.0
11
  transformers>=4.40
12
  numpy>=1.24