Spaces:

ub-aac-chatbot
/

aac-chatbot

Sleeping

shwetangisingh commited on 17 days ago

Commit

4a7c575

1 Parent(s): 0e19ba2

add Dockerfile + HF Space frontmatter for hosted deploy

Multi-stage Dockerfile: Node 22 + pnpm builds the frontend in stage 1,
Python 3.12 + CPU-only torch + sentence-transformers serves it via
FastAPI in stage 2. The backend serves the built dist/ as static files,
so it's one container, one process, one port.

requirements-docker.txt pins the PyTorch CPU wheel index so the build
doesn't pull ~2GB of unusable CUDA wheels on HF Spaces' free CPU instance.
The base requirements.txt stays platform-neutral for local conda dev.

README gets HF Space YAML frontmatter (sdk: docker, app_port: 7860) and
a hosting section walking through both local docker run and the HF push.

Files changed (5) hide show

.dockerignore +43 -0
Dockerfile +72 -0
README.md +50 -0
requirements-docker.txt +12 -0
requirements.txt +4 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,43 @@

+.git
+.gitignore
+.github
+.vscode
+.idea
+.claude
+.code-review-graph
+.ruff_cache
+.pre-commit-config.yaml
+ruff.toml
+# Keep build deterministic — rebuild indexes inside the container
+data/vector_store/
+data/pick_index/
+data/faiss_store/
+# Logs and ephemeral state
+logs/
+mlflow.db
+mlruns/
+*.csv
+# Local dev artefacts
+**/__pycache__/
+**/*.pyc
+**/*.pyo
+**/.pytest_cache
+**/.mypy_cache
+# Frontend build artefacts (rebuilt in Docker stage 1)
+frontend/node_modules/
+frontend/dist/
+# Misc
+.DS_Store
+*.swp
+.env
+ProjectDetails.pdf
+docs/
+README.md
+lessons.md
+references.md

Dockerfile ADDED Viewed

	@@ -0,0 +1,72 @@

+# ── Stage 1: build the React frontend ────────────────────────────────────────
+FROM node:22-slim AS frontend
+WORKDIR /app/frontend
+# pnpm via corepack (ships with Node 22)
+RUN corepack enable
+COPY frontend/package.json frontend/pnpm-lock.yaml ./
+RUN pnpm install --frozen-lockfile
+COPY frontend/ ./
+RUN pnpm build
+# ── Stage 2: Python runtime ──────────────────────────────────────────────────
+FROM python:3.12-slim
+# HF_HOME points at a writable cache dir for transformers/sentence-transformers.
+# On HF Spaces the default $HOME is read-only at runtime, so we explicitly
+# steer the model cache somewhere writable.
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1 \
+    HF_HOME=/tmp/hf_cache \
+    XDG_CACHE_HOME=/tmp/.cache
+WORKDIR /app
+# System deps for torch + sentence-transformers (most are already in slim).
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends \
+        build-essential \
+        curl \
+    && rm -rf /var/lib/apt/lists/*
+# Install Python deps via a Docker-specific requirements file that pins torch
+# to the CPU-only wheel index. The base requirements.txt stays platform-neutral
+# so local conda dev (./setup.sh) keeps using whatever torch flavor your OS
+# wants (MPS on macOS, CUDA on Linux+GPU); this image deliberately uses CPU
+# only because HF Spaces' free CPU instance can't use CUDA anyway.
+COPY requirements.txt requirements-docker.txt ./
+RUN pip install --upgrade pip \
+    && pip install --retries 5 --timeout 120 -r requirements-docker.txt
+# Copy the backend + persona source data.
+COPY backend/ ./backend/
+COPY data/memories/ ./data/memories/
+COPY data/users.json ./data/users.json
+COPY data/generate_users.py ./data/generate_users.py
+# Build per-user vector indexes inside the image (downloads BGE on first run).
+# This bakes the indexes into the image so first-request latency is just the
+# model warm-up, not a fresh BGE encode of every persona.
+RUN python -m backend.retrieval.vector_store
+# Pull the built static frontend from stage 1.
+COPY --from=frontend /app/frontend/dist ./frontend/dist
+# Pre-create writable directories. HF Spaces filesystem is read-only outside
+# /tmp at runtime, so logs default to /tmp; locally you can override LOGS_DIR
+# via env to anything mounted/writable.
+RUN mkdir -p /tmp/logs /tmp/hf_cache /tmp/.cache /tmp/pick_index \
+    && chmod -R 777 /tmp/logs /tmp/hf_cache /tmp/.cache /tmp/pick_index
+ENV LOGS_DIR=/tmp/logs
+# HF Spaces expects 7860 by default; respects $PORT for local docker run.
+ENV PORT=7860
+EXPOSE 7860
+# sh -c expands $PORT at runtime so the same image runs both on HF (port 7860,
+# unset PORT or PORT=7860) and locally (e.g. `docker run -e PORT=8000 ...`).
+CMD sh -c "uvicorn backend.api.main:app --host 0.0.0.0 --port ${PORT:-7860}"

README.md CHANGED Viewed

@@ -1,3 +1,14 @@
 # Multimodal AAC Chatbot
 A chatbot that **speaks as an AAC user, not to them.** You pick a persona — fourteen are shipped, anchored in real memoirs and canonical fiction — and the partner talks to them. The bot replies in that person's voice, using their memories, and adjusts what it says based on what the webcam sees: facial expression, hand gestures, where they're looking, and letters they trace in the air.
@@ -14,6 +25,7 @@ It's a training-free agentic RAG pipeline — a plain Python function chain with
 - [Setup](#setup)
 - [Configuration](#configuration)
 - [Running the Project](#running-the-project)
 - [Project Structure](#project-structure)
 - [Personas](#personas)
 - [Team](#team)
@@ -318,6 +330,44 @@ Output covers latency quantiles + SLO pass rate, faithfulness (groundedness / ha
 ---
 ## Project Structure
 ```

+---
+title: Multimodal AAC Chatbot
+emoji: 🌸
+colorFrom: pink
+colorTo: indigo
+sdk: docker
+app_port: 7860
+pinned: false
+license: other
+---
 # Multimodal AAC Chatbot
 A chatbot that **speaks as an AAC user, not to them.** You pick a persona — fourteen are shipped, anchored in real memoirs and canonical fiction — and the partner talks to them. The bot replies in that person's voice, using their memories, and adjusts what it says based on what the webcam sees: facial expression, hand gestures, where they're looking, and letters they trace in the air.
 - [Setup](#setup)
 - [Configuration](#configuration)
 - [Running the Project](#running-the-project)
+- [Hosting](#hosting)
 - [Project Structure](#project-structure)
 - [Personas](#personas)
 - [Team](#team)
 ---
+## Hosting
+The project ships with a single [Dockerfile](Dockerfile) that builds the React frontend in stage 1 (Node 22 + pnpm) and runs the FastAPI backend in stage 2 (Python 3.12 + torch + sentence-transformers). The backend serves the built `frontend/dist/` as static files, so it's one container, one process, one port.
+The same image runs identically in two places.
+### Locally (for development that mirrors production)
+```bash
+docker build -t aac-chatbot .
+docker run --rm -p 8000:8000 -e PORT=8000 --env-file .env aac-chatbot
+# → http://localhost:8000
+```
+The `--env-file .env` injects your Ollama Cloud key + endpoints (same `.env` you use for `./run.sh`). Conda + `./run.sh` is still the fastest dev loop because it hot-reloads; the docker path is for when you want byte-identical-to-production behaviour.
+### On Hugging Face Spaces (public URL for graders)
+The repo doubles as an HF Space — `README.md` carries the YAML frontmatter HF needs (`sdk: docker`, `app_port: 7860`).
+1. Create a new Space on huggingface.co (Docker SDK, public).
+2. Add this repo as a remote:
+   ```bash
+   git remote add space https://huggingface.co/spaces/<your-username>/aac-chatbot
+   git push space main
+   ```
+3. In the Space's *Settings → Variables and secrets*, add the LLM-tier secrets (don't commit them):
+   - `PRIMARY_API_KEY`, `PRIMARY_BASE_URL`, `PRIMARY_MODEL`
+   - `FALLBACK_API_KEY`, `FALLBACK_BASE_URL`, `FALLBACK_MODEL`
+   - `INK_VISION_API_KEY`, `INK_VISION_BASE_URL`, `INK_VISION_MODEL`
+4. The Space rebuilds the Dockerfile on every push. First build takes ~5-8 min (downloads BGE + builds vector indexes for all personas); subsequent builds reuse Docker layer cache and finish in 2-3 min.
+The deployed instance won't persist `logs/` or `data/pick_index/` across container restarts (HF Spaces filesystem is read-only outside `/tmp`). For the writeup, your local logs are the source of truth — the Space is just a click-around demo for graders.
+**Webcam note.** `getUserMedia` requires HTTPS, which both HF Spaces and `localhost` provide. Random IP addresses don't, so don't try to demo from a LAN IP without a tunnel.
+---
 ## Project Structure
 ```

requirements-docker.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+# Docker / HF Space install — uses CPU-only torch to avoid pulling ~2GB of
+# CUDA wheels we can't use on HF Spaces' free CPU instance.
+#
+# The PyTorch CPU index serves wheels with a `+cpu` local-version tag.
+# Listing it FIRST (via --index-url) makes pip prefer those wheels for torch;
+# the PyPI fallback covers everything else.
+--index-url https://download.pytorch.org/whl/cpu
+--extra-index-url https://pypi.org/simple
+# Everything from the standard requirements file. Torch will resolve to the
+# CPU wheel from the index above; the rest from PyPI.
+-r requirements.txt

requirements.txt CHANGED Viewed

@@ -3,6 +3,10 @@ openai>=1.0          # talks to Ollama Cloud over OpenAI-compatible HTTP
 # ── Retrieval ──────────────────────────────────────────────────────────────────
 sentence-transformers>=3.0
 torch>=2.0
 transformers>=4.40
 numpy>=1.24

 # ── Retrieval ──────────────────────────────────────────────────────────────────
 sentence-transformers>=3.0
+# torch is installed separately in the Dockerfile from the CPU-only wheel
+# index. For local conda dev (./setup.sh) we use the default platform torch
+# (which on macOS gives MPS acceleration). The constraint is satisfied either
+# way; the comment is here so a fresh reader doesn't try to pin it.
 torch>=2.0
 transformers>=4.40
 numpy>=1.24