paperhawk / docs /HUGGINGFACE_DEPLOYMENT.md
NΓ‘ndorfi Vince
Sync documentation overhaul from main (markdown only, LFS history preserved)
3385e0e
|
raw
history blame
11.8 kB

Hugging Face Spaces Deployment

How we deployed the PaperHawk Streamlit application as a public Hugging Face Space, with the AMD MI300X vLLM endpoint as its inference backend.


What you get

  • Public Space URL β€” a Streamlit app anyone can use in a browser, no signup
  • Free CPU Basic tier β€” 16 GB RAM, 2 vCPU. The app runs here; the LLM runs on AMD MI300X via vLLM (separate Cloud).
  • Two paths: under the lablab-ai-amd-developer-hackathon org (Plan A β€” qualifies for HF Special Prize), or under your personal account (Plan B β€” fallback if the org has hardware-quota issues)

Live example: https://huggingface.co/spaces/Vincsipe/paperhawk


Prerequisites

  1. Hugging Face account (free)
  2. Optional: membership in the lablab-ai-amd-developer-hackathon org if submitting to the AMD Developer Hackathon (Plan A). The HF Special Prize requires the Space to live under this org.
  3. A running vLLM endpoint on AMD MI300X β€” see AMD_DEPLOYMENT.md
  4. The PaperHawk repo cloned locally with Dockerfile, README.md, and app/main.py

Step 1 β€” Create the Space

Go to https://huggingface.co/new-space (or, if you're an org member, click + New β†’ New Space from the org page).

Configuration:

Field Value
Owner lablab-ai-amd-developer-hackathon (Plan A) or your personal handle (Plan B)
Space name paperhawk
Short description Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X
License mit
Space SDK Docker (not Streamlit, not Gradio β€” see step 2)
Template Blank (we ship our own Dockerfile)
Hardware CPU Basic (free, 16 GB RAM)
Visibility Public (required for the HF Special Prize)

Click Create Space. You'll get an empty repo at:

https://huggingface.co/spaces/<owner>/paperhawk

Why Docker SDK and not Streamlit-template? As of 2026, the HF Spaces "Streamlit" SDK lives under the Docker tab as a managed template. We bypass the template because PaperHawk needs custom OS dependencies (Tesseract OCR for EN/HU/DE, poppler-utils for table extraction, libmupdf for PDFs) that the templated builder doesn't include. Our own Dockerfile is faster to debug and gives us a deterministic base image.


Step 2 β€” Configure the Dockerfile for HF Spaces

The PaperHawk Dockerfile is HF-Spaces-ready out of the box, with one critical detail: port 7860.

# syntax=docker/dockerfile:1.6
FROM python:3.12-slim AS base

ENV PYTHONUNBUFFERED=1 PYTHONDONTWRITEBYTECODE=1

# OS deps
RUN apt-get update && apt-get install -y --no-install-recommends \
    tesseract-ocr tesseract-ocr-eng tesseract-ocr-hun tesseract-ocr-deu \
    poppler-utils libmupdf-dev curl \
 && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY requirements.txt .
RUN pip install --upgrade pip \
 && pip install --index-url https://download.pytorch.org/whl/cpu torch \
 && pip install -r requirements.txt

# Pre-download the embedding model so the first user request isn't slow
RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-m3')"

COPY . .

# HF Spaces expects port 7860 (NOT Streamlit's default 8501)
EXPOSE 7860
CMD ["streamlit", "run", "app/main.py", \
     "--server.address=0.0.0.0", \
     "--server.port=7860", \
     "--server.headless=true"]

Why 7860? HF Spaces' Docker hosting only routes traffic to port 7860 β€” the Streamlit default 8501 is invisible to the public URL. This is a one-line fix that's easy to miss.


Step 3 β€” Configure the README YAML front-matter

HF Spaces reads the YAML block at the top of README.md to configure the Space card and build behavior. PaperHawk's:

---
title: PaperHawk
emoji: πŸ¦…
colorFrom: red
colorTo: yellow
sdk: docker
pinned: false
license: mit
short_description: Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X
---

Critical: colorTo must be one of [red, yellow, green, blue, indigo, purple, pink, gray]. We initially used orange (because the AMD brand color is orange) β€” HF rejected the YAML as invalid, and the Space card fell back to a generic theme with the YAML rendered as a Markdown table at the top of the page. Fixed by changing to yellow.

If the Space's main page shows a title | PaperHawk table at the top, the YAML is invalid and HF can't parse it β€” check the colorTo value first.


Step 4 β€” Set up Git LFS for binary assets

HF Spaces has a strict rule: every binary file (*.png, *.pdf, *.pptx, *.docx, *.jpg, *.mp4) must live in Xet storage via Git LFS, not as a regular Git blob. The cover PNG, the slide PDF, the demo packages β€” all of these get rejected without LFS.

On your local machine:

# One-time, in any repo with binary files
sudo apt install git-lfs   # or `brew install git-lfs` on macOS
git lfs install

In the PaperHawk repo:

git lfs track "*.png" "*.pdf" "*.pptx" "*.docx" "*.jpeg" "*.jpg" "*.mp4"
git add .gitattributes
git commit -m "Track binary files via LFS"

Important: git lfs track only updates .gitattributes. Existing commits with binaries-as-Git-blob are still rejected by HF. Migrate the entire history:

git lfs migrate import --include="*.png,*.pdf,*.pptx,*.docx,*.jpeg,*.jpg,*.mp4"

This rewrites the HEAD commit so the binaries are LFS-blobs. New git push will upload them via Xet.

Files over 10 MB: HF Spaces also enforces a 10 MB hard limit per file even via LFS for the free Spaces tier. Any single video over 10 MB will be rejected. If you have demo videos, keep them as separate uploads on YouTube/Vimeo and link from the Space description.


Step 5 β€” Add the Space as a git remote and push

# Add a remote for the Space (token embedded in URL avoids dual auth-prompts)
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxx   # generate at https://huggingface.co/settings/tokens (Write scope, fine-grained, with org access if Plan A)
git remote add space https://<your-hf-username>:${HF_TOKEN}@huggingface.co/spaces/<owner>/paperhawk

# Push to the Space
git push --force space main

Why token in URL? Git LFS uses a separate authentication channel from the regular Git push. Without the token in the URL, Git prompts for credentials twice and one of them silently times out. Putting the token in the URL handles both.

The first push uploads ~9 MB of LFS objects (the cover image, slide PDF, sample PDFs, sample DOCX). Subsequent pushes are fast (cached on HF's side).


Step 6 β€” Add Space secrets

The app reads its LLM provider config from environment variables. In the Space:

Settings (top-right, on the Space page) β†’ Variables and secrets β†’ + New variable for each:

Key Value Type
LLM_PROFILE vllm Variable
VLLM_BASE_URL http://<MI300X_DROPLET_IP>:8000/v1 Variable
VLLM_MODEL Qwen/Qwen2.5-14B-Instruct Variable
EMBEDDING_MODEL BAAI/bge-m3 Variable
VLLM_API_KEY sk-paperhawk-2026 (the same token you passed to vLLM --api-key) Secret

The VLLM_API_KEY must be a Secret, not a Variable β€” Secrets are masked in the UI and not exposed via the public Space metadata.

After saving, the Space rebuilds automatically (~5 minutes for first build, faster for subsequent).


Step 7 β€” Wait for the build, then verify

The first build pulls and installs everything β€” Python 3.12-slim, OS deps, PyTorch CPU wheel, the BAAI/bge-m3 model (~2.3 GB pre-download), and the rest of requirements.txt. Expect 8–15 minutes for the cold build.

Watch the build logs in the Space β†’ Logs tab. When you see streamlit run app/main.py and You can now view your Streamlit app in your browser the Space is up.

Open the Space URL in a browser and click Audit Demo. If the vLLM endpoint is reachable, you'll see results in 20–25 seconds.

If you get an error like Connection refused or a long hang, check:

  1. The MI300X droplet is running and vllm serve is up (SSH in, look at the SSH window from AMD_DEPLOYMENT.md step 6)
  2. The droplet's UFW has port 8000 open (ufw status | grep 8000 from the droplet)
  3. The VLLM_BASE_URL in Space Secrets matches the droplet's current public IP (which changes on every recreate-from-snapshot)

Step 8 β€” Hide the YAML from the GitHub display (optional)

The YAML front-matter is needed for HF Spaces but looks ugly on GitHub β€” the renderer shows it as a key | value table at the top of the README, with no formatting.

Workaround: GitHub honors .github/README.md over the root README.md for the public repo display. We commit a copy of the README without the YAML block as .github/README.md:

mkdir -p .github
tail -n +12 README.md > .github/README.md   # skip the first 11 lines (the YAML + blank line)
# (optionally edit .github/README.md to use absolute raw-image URLs for paperhawk.jpeg)
git add .github/README.md
git commit -m "Add .github/README.md to hide HF YAML on GitHub display"
git push origin main

Now GitHub shows .github/README.md (clean), and HF Spaces still reads the root README.md (with YAML). One file, two faces.


Plan A vs Plan B

Aspect Plan A (org Space) Plan B (personal Space)
Owner lablab-ai-amd-developer-hackathon/paperhawk <your-handle>/paperhawk
HF Special Prize βœ… Qualifies ❌ Disqualifies
Org-quota dependency ⚠️ Yes (shared with other org Spaces) ❌ Independent
Visibility Public, on the org page Public, on your profile
Setup steps Same as above Same as above

If the org-quota is exhausted (we hit null quota limit 403 errors), the same code, same Dockerfile, same YAML, same env-var setup pushes to a personal Space and runs immediately. This was our Plan B safety net during the hackathon.


Common pitfalls

  • "Build failed: app port 7860 not reachable": Your Dockerfile is binding to a different port (probably Streamlit's default 8501). Change EXPOSE and CMD to use 7860.
  • YAML rendered as a Markdown table on the Space main page: The YAML is invalid. Most likely culprits: invalid colorTo (allowed: red/yellow/green/blue/indigo/purple/pink/gray, not orange), invalid sdk, missing --- opening line, BOM/whitespace before the first ---.
  • "binary files require Xet": You haven't run git lfs track + git lfs migrate import yet. The HF push rejects committed binaries that aren't LFS-blobs.
  • "Files larger than 10 MiB are not allowed": A single file is over 10 MB even after LFS. Move it out of the repo and link from the README.
  • "null quota limit" 403 error: Org-level hardware quota is exhausted. Wait for capacity, ping a lablab admin in Discord, or push to a personal Space (Plan B).
  • App loads but "Connection refused" on Audit Demo: The vLLM endpoint is down or the IP changed. SSH into the droplet and confirm vllm serve is running. Update VLLM_BASE_URL Secret if the IP rotated.
  • App loads but "401 Unauthorized" on every LLM call: The VLLM_API_KEY Secret doesn't match the --api-key you passed to vllm serve. They have to be byte-for-byte identical.

Cross-references