Spaces:

Vincsipe
/

paperhawk

Running

App Files Files Community

Nándorfi Vince commited on 2 days ago

Commit

3385e0e

1 Parent(s): 67b464c

Sync documentation overhaul from main (markdown only, LFS history preserved)

Browse files

Files changed (9) hide show

README.md +98 -105
docs/AMD_DEPLOYMENT.md +265 -0
docs/ARCHITECTURE.md +229 -0
docs/HF_SPACE_DEFAULT_GETTING_STARTED.md +0 -193
docs/HUGGINGFACE_DEPLOYMENT.md +251 -0
docs/SUBMISSION.md +113 -10
docs/hf-space-deployment.md +0 -124
docs/qwen-vllm-deployment.md +0 -68
docs/social-posts/post-1-build-window-opens.md +0 -165

README.md CHANGED Viewed

@@ -10,159 +10,152 @@ short_description: Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X
 ---
 <p align="center">
-  <img src="paperhawk.jpeg" alt="PaperHawk" width="900">
 </p>
 <h1 align="center">PaperHawk</h1>
 <p align="center">
   <strong>Agentic document intelligence on AMD MI300X</strong><br>
-  Multi-document due diligence with deterministic domain checks and agentic LLM workflows.
 </p>
 <p align="center">
-  <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
   <img src="https://img.shields.io/badge/python-3.12+-blue.svg" alt="Python">
   <img src="https://img.shields.io/badge/LangGraph-0.6-green.svg" alt="LangGraph">
   <img src="https://img.shields.io/badge/AMD-MI300X-red.svg" alt="AMD MI300X">
 </p>
 <p align="center">
-  Built for the <a href="https://lablab.ai/event/amd-developer-hackathon"><strong>AMD Developer Hackathon × lablab.ai</strong></a> (May 2026).
 </p>
 ---
-## What is this?
-A working AI system that ingests multiple business documents (invoices,
-contracts, delivery notes, purchase orders, financial reports) and:
-- **Extracts structured data** with anti-hallucination layers (5+1 stack)
-- **Detects risks** via 14 deterministic domain rules + LLM ensemble
-- **Cross-references documents** (three-way matching for audits, M&A DD)
-- **Answers questions** via 5-tool agentic chat with source citations
-- **Generates audit-ready reports** (DOCX export, JSON API)
-This is **not "just another RAG"** — it is a multi-agent orchestration of
-specialist nodes (audit / legal / compliance / financial) over a deterministic
-+ LLM ensemble, with explicit anti-hallucination layers.
-## Stack
-| Layer | Technology |
-|-------|------------|
-| Orchestration | **LangGraph 0.6** (4 graphs, 6 subgraphs, async-first, AsyncSqliteSaver) |
-| LLM | **Qwen 2.5 14B Instruct** via vLLM on **AMD Instinct MI300X** |
-| Embedding | **BAAI/bge-m3** (multilingual, 1024 dim, sentence-transformers) |
-| Vector store | **ChromaDB + BM25** hybrid (Reciprocal Rank Fusion) |
-| UI | **Streamlit** (5 tabs) — deployable as a **Hugging Face Space** |
-| Testing | pytest + Playwright |
-## Architecture
-```
-                    ┌─────────────────────────────────┐
-                    │    Streamlit UI (5 tabs)        │
-                    └────────────┬────────────────────┘
-                                 │
-        ┌────────────────────────┼────────────────────────┐
-        │                        │                        │
-┌───────▼──────┐        ┌────────▼────────┐       ┌──────▼──────┐
-│ pipeline     │        │  chat_graph     │       │  dd_graph   │
-│ _graph       │        │  (5 tools, 17   │       │  (multi-    │
-│ (6 subgraphs)│        │  rule prompt)   │       │  agent      │
-└───────┬──────┘        └─────────────────┘       │  super-     │
-        │                                         │  visor)     │
-        │  ┌───────��─────────────────┐            └─────────────┘
-        ├──▶ ingest_subgraph         │
-        ├──▶ classify (per-doc)      │
-        ├──▶ extract_subgraph        │
-        ├──▶ rag_index_subgraph      │
-        ├──▶ compare_node (3-way)    │
-        └──▶ risk_subgraph           │
-             ├─ basic risk           │
-             ├─ 14 domain checks     │
-             ├─ LLM risk + 3 filters │
-             ├─ plausibility         │
-             └─ duplicate (ISA 240)  │
-```
-See [ARCHITECTURE.md](ARCHITECTURE.md) for the full architecture.
-## Quick start
-### 1. Local dev (Ollama or dummy mode)
 ```bash
-git clone https://github.com/<YOUR_GH_USER>/document-intelligence-agentic-langgraph-amd
-cd document-intelligence-agentic-langgraph-amd
-python -m venv .venv && source .venv/bin/activate
-pip install -r requirements.txt
-cp .env.example .env
-# Edit .env: set LLM_PROFILE=dummy (no LLM) or LLM_PROFILE=ollama (Qwen 7B local)
-streamlit run app/main.py
 ```
-### 2. Production (Qwen on AMD MI300X via vLLM)
 ```bash
-# On the AMD Developer Cloud MI300X instance:
-docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video \
-    --ipc=host --shm-size 16g \
-    -p 8000:8000 \
-    -e VLLM_MODEL=Qwen/Qwen2.5-14B-Instruct \
-    rocm/vllm:latest \
-    sh -c 'vllm serve $VLLM_MODEL --host 0.0.0.0 --port 8000 \
-        --tensor-parallel-size 1 --max-model-len 32768'
-# On your machine (.env):
-LLM_PROFILE=vllm
-VLLM_BASE_URL=http://<mi300x-public-ip>:8000/v1
-VLLM_MODEL=Qwen/Qwen2.5-14B-Instruct
-streamlit run app/main.py
 ```
-See [docs/qwen-vllm-deployment.md](docs/qwen-vllm-deployment.md) for the full
-walkthrough including cost monitoring and a Plan B (Ollama fallback).
-### 3. Hugging Face Space deploy
-See [docs/hf-space-deployment.md](docs/hf-space-deployment.md).
-## Demo packages
-Three pre-built demo packages bundled in `test_data/`:
-- **Audit Demo** — 3 invoices from the same supplier; the March one is 50%
-  pricier (over-billing pattern detected by the package-level analyzer).
-- **DD Demo** — NDA + service agreement + amendment in an acquisition
-  scenario (change-of-control + auto-renewal red flags).
-- **Compliance Demo** — 2 contracts; one is missing the GDPR Article 28 clause.
-Click the corresponding button on the **Upload** tab.
 ## Documentation
-- [ARCHITECTURE.md](ARCHITECTURE.md) — architecture overview (English)
-- [docs/qwen-vllm-deployment.md](docs/qwen-vllm-deployment.md) — Qwen on AMD MI300X (English)
-- [docs/hf-space-deployment.md](docs/hf-space-deployment.md) — Hugging Face Space deploy (English)
-- [docs/LANGGRAPH_ONBOARDING.md](docs/LANGGRAPH_ONBOARDING.md) — onboarding for contributors (English)
-- [CLAUDE.md](CLAUDE.md) — project-level Claude Code instructions
-- [NOTICE.md](NOTICE.md) — author intent (non-binding)
-- `docs/Teljes-rendszer-attekintes-langgraph_HU.md` — legacy Hungarian system overview (reference)
-- `docs/MUKODESI_LEIRAS_HU.md` — legacy Hungarian operations manual (reference)
-## Built by
-**Team CsimpiCsirkek** for the AMD Developer Hackathon × lablab.ai (2026):
-- Nándorfi Vince
-- Vitai Tamás
-- Murcsik Gábor
 ## License
-**MIT** — see [LICENSE](LICENSE).

 ---
 <p align="center">
+  <img src="https://raw.githubusercontent.com/nandorfivince/paperhawk/main/paperhawk.jpeg" alt="PaperHawk" width="900">
 </p>
 <h1 align="center">PaperHawk</h1>
 <p align="center">
   <strong>Agentic document intelligence on AMD MI300X</strong><br>
+  Multi-document due diligence with deterministic compliance rules and a 6-layer anti-hallucination stack.
 </p>
 <p align="center">
+  <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT">
   <img src="https://img.shields.io/badge/python-3.12+-blue.svg" alt="Python">
   <img src="https://img.shields.io/badge/LangGraph-0.6-green.svg" alt="LangGraph">
   <img src="https://img.shields.io/badge/AMD-MI300X-red.svg" alt="AMD MI300X">
+  <img src="https://img.shields.io/badge/Qwen-2.5%2014B-purple.svg" alt="Qwen 2.5 14B">
 </p>
 <p align="center">
+  Built for the <strong>AMD Developer Hackathon × lablab.ai</strong> (May 2026).
 </p>
 ---
+## What is PaperHawk?
+PaperHawk is an **agentic multi-document intelligence platform** for auditors, lawyers, tax advisors, and DD analysts. It processes 3–50 PDFs simultaneously and detects **cross-document red flags humans miss** — like a 57.5% price drift across three invoices from the same supplier — using a multi-agent LangGraph orchestration on top of Qwen 2.5 14B Instruct served via vLLM on AMD Instinct MI300X.
+It is **not** a chatbot. It is a typed-state, multi-graph reasoning system with deterministic compliance rules, verbatim source citations, and a quote validator that catches LLM hallucinations before they reach the user.
+## Why it matters
+A senior auditor needs ~8 hours to thoroughly review a 50-page invoice/contract package. ChatGPT, Copilot, and Harvey handle one document at a time, hallucinate citations, and lack jurisdiction-specific compliance knowledge. PaperHawk handles the entire package, applies 14 statutory rules hand-coded in Python, and finishes a 3-document audit in **23.3 seconds** (61.7× faster than manual review) — with auditor-grade citations and ISA/GDPR/HU-VAT mappings.
+---
+## Technical highlights
+- **Multi-agent LangGraph 0.6 orchestration** — 4 compiled graphs (pipeline, chat, DD, package_insights) + 6 reusable subgraphs with Send-API parallelism
+- **5-tool agentic chat** with strict `[Source: filename.pdf]` citations validated by a post-processor (no provenance → no answer)
+- **6-layer anti-hallucination stack** — `temperature=0`, verbatim source quotes, field-level confidence, plausibility validators, 3-stage LLM-risk filter chain, quote validator
+- **Provider abstraction** with `configurable_alternatives` — vLLM (production) / Ollama (local dev) / dummy (CI) — swap with one env var, zero code changes
+- **AMD Instinct MI300X via vLLM** — 192 GB HBM3, 27.6 GB model + 141 GB available KV cache, 307 t/s prompt + 252 t/s generation, 30.4% prefix cache hit rate
+- **61.7× speedup** vs manual audit on a 3-document package (23.3 sec vs ~24 min)
+- **Hugging Face Space deployable** with Docker SDK + Git LFS for binary assets
+## Domain highlights
+- **14 deterministic statutory rules** hand-coded in Python (NOT prompt-engineered) — ISA 240/320/500 audit standards, HU VAT Act §169 mandatory invoice elements, Ptk. 6:98 disproportionate penalty clauses, Art. 22 tax-ID validation, GDPR Article 28 sub-processor language, Incoterms 2020, AML sanctions list (EU/OFAC fuzzy match)
+- **Cross-document red flag detection** — three-way matching (invoice + delivery note + PO), package-level pricing anomalies, duplicate-invoice detection (ISA 240), change-of-control trigger detection (M&A DD)
+- **Multi-agent DD assistant** — 4 specialists (audit / legal / compliance / financial) coordinated by a supervisor and a synthesizer for executive summaries
+- **Auditor-grade citations** — every finding maps to a regulation source (HU VAT Act §169, ISA 500, GDPR Art. 28, etc.) with verbatim source quote
+- **Multilingual ingest** — EN / HU / DE OCR via Tesseract, native PDF + DOCX, vision-first scanned-PDF fallback
+---
+## Try the live demo
+**Public Hugging Face Space** (no signup, runs in browser):
+→ <https://huggingface.co/spaces/Vincsipe/paperhawk>
+Click **Audit Demo** in the Quick demo section. Three pre-bundled invoices process in ~25 seconds and you'll see the cross-doc 57.5% price drift flag, the 14 deterministic checks, and the auditor-grade citations.
+Backed by an AMD MI300X vLLM endpoint serving Qwen 2.5 14B Instruct.
+---
+## Run it locally
+Two options depending on whether you have a GPU or just want a quick smoke test.
+### Quick demo (~3 minutes, no GPU needed)
+Uses the **deterministic dummy provider** — runs the full pipeline, all 14 domain checks, and the multi-agent orchestration without any LLM calls. Good for verifying the system runs end-to-end.
 ```bash
+git clone https://github.com/nandorfivince/paperhawk
+cd paperhawk
+make install
+LLM_PROFILE=dummy make dev
 ```
+Open <http://localhost:8501> → **Audit Demo** button. Result in ~5 seconds (dummy provider returns deterministic test data).
+### Full demo (~10 minutes, ~16 GB VRAM recommended)
+Uses **Ollama with Qwen 2.5 14B Instruct** (the same model we deployed to AMD MI300X via vLLM). On a consumer GPU like NVIDIA RTX 4090 / RTX PRO 4500 (32 GB VRAM) you'll see real, production-grade multi-agent reasoning.
 ```bash
+git clone https://github.com/nandorfivince/paperhawk
+cd paperhawk
+make install
+# Pull the model (one-time, ~9 GB download)
+ollama pull qwen2.5:14b-instruct
+# Run the app pointed at Ollama
+LLM_PROFILE=ollama OLLAMA_MODEL=qwen2.5:14b-instruct \
+  streamlit run app/main.py --server.port=8501 --server.fileWatcherType=none
 ```
+Open <http://localhost:8501> → **Audit Demo** button.
+**Expected results on an RTX PRO 4500 (32 GB GDDR7)**:
+- Audit Demo: ~80 seconds for 3 invoices, 17.5× speedup vs manual
+- 8 risk findings (2 HIGH, 4 MEDIUM, 2 LOW), HU VAT Act §169 mappings
+- Cross-doc package-level analyzer flags the 57.5% price-drift red flag
+- Quote validator catches 4 of 6 hallucinated citations and downgrades them to `low` confidence
+(On AMD MI300X via vLLM: ~23 seconds, 61.7× speedup. 5× faster than Ollama on consumer GPU.)
+### Docker compose (alternative)
+```bash
+make run-local
+```
+Spins up the Streamlit app + Ollama in containers. First run pulls the model (~9 GB).
+---
 ## Documentation
+| Document | What it covers |
+|---|---|
+| [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) | LangGraph multi-graph design, 14 domain checks, anti-hallucination stack, multi-agent DD |
+| [`docs/AMD_DEPLOYMENT.md`](docs/AMD_DEPLOYMENT.md) | How we deployed Qwen 2.5 14B via vLLM on AMD Instinct MI300X (DigitalOcean-powered AMD Developer Cloud) |
+| [`docs/HUGGINGFACE_DEPLOYMENT.md`](docs/HUGGINGFACE_DEPLOYMENT.md) | How we deployed the Streamlit app as a public Hugging Face Space |
+For the full submission brief with TAM/SAM, competitor analysis, and the live deployment validation results, see [`docs/SUBMISSION.md`](docs/SUBMISSION.md).
+---
 ## License
+MIT — see [`LICENSE`](LICENSE). Use, fork, deploy commercially or non-commercially.
+## Built by
+**Team csimpicsirkek** (`PÁKÁK the AI warriors!` on the lablab.ai platform):
+- Vince Nándorfi — lead, LangGraph architecture, AMD adaptation
+- Erika Nagy — silent partner
+- Tamás Vitai
+- Gábor Murcsik
+For the AMD Developer Hackathon × lablab.ai, May 2026.

docs/AMD_DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,265 @@

+# AMD MI300X Deployment
+How we deployed Qwen 2.5 14B Instruct via vLLM on AMD Instinct MI300X using the AMD Developer Cloud (DigitalOcean-powered). End-to-end, with copy-paste commands and the costs we actually paid.
+---
+## What you get
+- **AMD Instinct MI300X** — 192 GB HBM3 GPU, 20 vCPU, 240 GB RAM, 720 GB NVMe boot disk
+- **vLLM 0.17.1 + ROCm 7.0** — pre-installed via the Quick Start image
+- **OpenAI-compatible REST endpoint** at `http://<droplet-ip>:8000/v1`
+- **Cost**: $1.99 / GPU / hour. Free $100 credit covers ~50 hours.
+---
+## Prerequisites
+1. **AMD AI Developer Program signup** — <https://www.amd.com/en/developer/ai-dev-program.html>
+   - Approval takes 1–2 business days; you receive a $100 cloud credit by email automatically
+2. **lablab.ai event Enroll** (for hackathon participants) — <https://lablab.ai/event/amd-developer-hackathon>
+3. **SSH key on your local machine** (we recommend a dedicated key, not your default GitHub key — see step 1 below)
+---
+## Step 1 — Generate a dedicated SSH key
+The default `~/.ssh/id_ed25519` is often passphrase-protected and routed through a GNOME-keyring agent that interferes with non-interactive `ssh-add`. Sidestep it with a passphrase-less, dedicated key:
+```bash
+ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_amd_paperhawk -N "" -C "you@paperhawk-amd"
+cat ~/.ssh/id_ed25519_amd_paperhawk.pub
+```
+Copy the public key to clipboard for the next step.
+---
+## Step 2 — Create a GPU Droplet
+Go to <https://cloud.amd.com/> (or <https://amd.digitalocean.com/>) and click **Create a GPU Droplet** on the homepage card.
+**Caution**: the left-sidebar `GPU Droplets` link routes to the CPU Droplet flow as of May 2026 (a UI bug). Use the homepage card or the top-right `Create ▼` dropdown.
+### Configuration
+- **GPU Plan**: AMD MI300X (single-GPU, $1.99/hr) — **not** the 8-GPU variant
+- **Region**: ATL1 (Atlanta) — NYC1 is often "out of capacity" for MI300X. If the Plan card is greyed out, the URL parameter `?region=atl1` switches you over.
+- **Image**: Quick Start → vLLM (0.17.1, ROCm 7.0) — comes with Docker, JupyterLab, and a pre-built `rocm` container
+- **SSH Key**: Add a new key, paste the public key from step 1, name it `paperhawk-amd-deploy`
+- **Visibility**: doesn't matter; the droplet is private to your account
+Click **Create GPU Droplet**. It takes 5–10 minutes to come up. Once `Active`, note the Public IPv4 address.
+---
+## Step 3 — SSH in
+```bash
+ssh -i ~/.ssh/id_ed25519_amd_paperhawk -o IdentityAgent=none root@<DROPLET_IP>
+```
+The `-o IdentityAgent=none` flag bypasses the GNOME-keyring SSH agent if it's misbehaving on your local machine.
+You'll see a welcome banner with two key facts:
+```
+Access the Jupyter Server: http://<IP>:80   (we don't use this)
+docker exec -it rocm /bin/bash              (we DO use this)
+```
+---
+## Step 4 — Open port 8000 in the firewall
+The Quick Start image ships with UFW enabled, allowing only SSH (22), HTTP (80), and HTTPS (443). vLLM runs on 8000, so we need to open it:
+```bash
+ufw allow 8000
+ufw status | grep 8000
+```
+You should see `8000 ALLOW Anywhere` and the IPv6 equivalent.
+The `--api-key` flag we pass to vLLM in step 6 prevents anyone scanning the public internet from using your endpoint — opening port 8000 is safe with API-key auth.
+---
+## Step 5 — (Optional) System upgrade and reboot
+The Quick Start image ships with ~120 outdated packages including security updates. Recommended before snapshotting:
+```bash
+apt-get update && DEBIAN_FRONTEND=noninteractive apt-get upgrade -y
+reboot
+```
+Wait ~1.5–2 minutes, then SSH in again. **The `rocm` Docker container does not auto-restart after the reboot**, so:
+```bash
+docker start rocm
+docker ps   # confirm `rocm` is Up
+```
+---
+## Step 6 — Start vLLM serving Qwen 2.5 14B
+Enter the Docker container:
+```bash
+docker exec -it rocm /bin/bash
+```
+Run vLLM in one long line (line continuations with `\` sometimes break under paste — single-line is most reliable):
+```bash
+vllm serve Qwen/Qwen2.5-14B-Instruct --api-key sk-paperhawk-2026 --port 8000 --host 0.0.0.0 --enable-auto-tool-choice --tool-call-parser hermes --trust-remote-code
+```
+What this does:
+| Flag | Why |
+|---|---|
+| `Qwen/Qwen2.5-14B-Instruct` | Model ID on Hugging Face Hub. vLLM auto-downloads on first run (~28 GB, ~6 sec from ATL DC) |
+| `--api-key sk-paperhawk-2026` | Bearer token required by every request. Anti-misuse for the public-internet endpoint. |
+| `--port 8000` | OpenAI-compat REST at `:8000/v1` |
+| `--host 0.0.0.0` | Bind on all interfaces so the public IP is reachable |
+| `--enable-auto-tool-choice` + `--tool-call-parser hermes` | Required for our 5-tool agentic chat. Qwen 2.5 uses Hermes-style tool calls. |
+| `--trust-remote-code` | Tokenizer ships custom code; flag is no-op for Qwen 2.5 but kept for compatibility |
+**What you'll see on first run** (~70 seconds total):
+```
+INFO 05-04 20:56:36 [utils.py:302]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.17.1
+INFO 05-04 20:56:36 [utils.py:302]   █▄█▀ █     █     █     █  model   Qwen/Qwen2.5-14B-Instruct
+config.json: 100%|████████████████████| 663/663 [00:00<00:00, 8.25MB/s]
+model-00001-of-00008.safetensors: 100%|██████| 3.89G/3.89G [00:05<00:00, 745MB/s]
+... (8 shards, ~28 GB total in 5.9 sec)
+INFO 05-04 20:57:08 [gpu_model_runner.py:4364] Model loading took 27.63 GiB memory and 17.358448 seconds
+INFO 05-04 20:57:32 [gpu_worker.py:424] Available KV cache memory: 141.96 GiB
+INFO 05-04 20:57:32 [kv_cache_utils.py:1314] GPU KV cache size: 775,280 tokens
+INFO 05-04 20:57:32 [kv_cache_utils.py:1319] Maximum concurrency for 32,768 tokens per request: 23.66x
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
+```
+The vLLM server now serves OpenAI-compatible requests. **Don't close this SSH session** — closing it kills the server. Open a second SSH window for the smoke test.
+---
+## Step 7 — Smoke-test the endpoint
+From your local machine:
+```bash
+# List models
+curl http://<DROPLET_IP>:8000/v1/models -H "Authorization: Bearer sk-paperhawk-2026"
+# Chat completion
+curl http://<DROPLET_IP>:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-paperhawk-2026" \
+  -d '{"model":"Qwen/Qwen2.5-14B-Instruct","messages":[{"role":"user","content":"Hello, who are you? Answer in one sentence."}],"max_tokens":50,"temperature":0}'
+```
+Expected response: `"I am Qwen, a large language model created by Alibaba Cloud."`
+If you get `401 Unauthorized`, the Bearer token is wrong (must match the `--api-key` value exactly). If you get `Connection refused`, port 8000 isn't open or the vLLM server didn't start — check the SSH window from step 6.
+---
+## Step 8 — Snapshot the droplet (cost optimization)
+Once everything works, take a live snapshot. It captures the entire boot disk (~96 GB including the Docker container with the cached Qwen model), so a future restart is **30 seconds** instead of a 70-second cold start.
+In the AMD Cloud UI:
+1. Droplet → **Backups & Snapshots** tab → **Take a Snapshot**
+2. Name: `paperhawk-vllm-tested-YYYY-MM-DD`
+3. Click **Take Live Snapshot** (live works fine — vLLM does only read-only inference)
+The snapshot takes 10–15 minutes. Storage cost: $0.06 / GB / month × ~96 GB = **~$0.32 / day**.
+---
+## Step 9 — Destroy the droplet (stop the meter)
+When you're done with the dev session, **destroy** the droplet (do not just power-off — powered-off droplets still bill at $1.99/hr).
+In the UI: Droplet → **Actions** ▼ → **Destroy** → type the droplet name to confirm.
+**Important**: when the destroy dialog asks if you also want to destroy the snapshot, **leave it unchecked**. The snapshot survives the destroy and is what you'll use to recreate the droplet.
+---
+## Step 10 — Recreate from snapshot (Friday morning)
+When you need the endpoint live again (e.g., for a demo or judging window):
+1. AMD Cloud → **Backups & Snapshots** → click `…` next to your snapshot → **Create GPU Droplet**
+2. Configuration: same MI300X / ATL1 / SSH key
+3. Wait 5–10 minutes for `Active`. Note the new public IP.
+Then SSH in (with the new IP) and:
+```bash
+docker start rocm
+docker exec -it rocm /bin/bash
+vllm serve Qwen/Qwen2.5-14B-Instruct --api-key sk-paperhawk-2026 --port 8000 --host 0.0.0.0 --enable-auto-tool-choice --tool-call-parser hermes --trust-remote-code
+```
+Because the snapshot includes the cached model in the Docker container layer, **vLLM startup is ~30 seconds** instead of 70.
+---
+## Live performance numbers (measured)
+From our end-to-end test on May 5, 2026:
+| Metric | Value |
+|---|---|
+| HF Hub model download (8 safetensors, 28 GB) | 5.9 sec (700+ MB/s from ATL DC) |
+| Model load to MI300X VRAM | 17.4 sec |
+| CUDA graph compile (51 size-buckets) | 20.5 sec |
+| **Total cold-start** | **~70 sec** |
+| **Warm restart from snapshot** | **~30 sec** |
+| Available KV cache (192 GB − 27.6 GB model − 22 GB headroom) | 141.96 GiB |
+| KV cache token capacity | 775,280 tokens |
+| Max concurrency at 32k context | 23.66× parallel requests |
+| Prompt throughput (live audit demo) | 307 tokens/sec |
+| Generation throughput (live audit demo) | 252 tokens/sec |
+| Prefix cache hit rate (multi-agent prompts) | 30.4% |
+| End-to-end audit demo (3 PDFs from HF Space) | 23.3 sec / 61.7× speedup vs manual |
+---
+## Cost breakdown (our actual hackathon spend)
+| Item | Cost |
+|---|---|
+| Initial dev session (provisioning, vLLM setup, debugging) | ~$3 |
+| Live validation session (30 minutes) | ~$1 |
+| Snapshot storage (5 days from Tuesday to Friday) | ~$1.60 |
+| Live judging window (estimated 24 hours) | ~$48 |
+| **Total estimated** | **~$54** of the free $100 credit |
+Plenty of buffer for a longer judging window or a second iteration.
+---
+## Common pitfalls
+- **"Out of capacity in the selected region"**: Switch to ATL1. NYC1 frequently runs out of MI300X. Pass `?region=atl1` in the Create-Droplet URL.
+- **`Permission denied (publickey)` on SSH**: Either the `~/.ssh/id_ed25519` is passphrase-protected and the agent isn't unlocked, or you have the wrong key. Use a dedicated passphrase-less key (step 1) and `-o IdentityAgent=none` on the ssh command.
+- **vLLM exits with `Triton FlashAttention error` on first run**: Older vLLM 0.8.x builds had this issue. The 0.17.1 + ROCm 7.0 build we use has it fixed. If you're stuck on an older image, prefix with `VLLM_USE_TRITON_FLASH_ATTN=0`.
+- **Docker container `rocm` not running after reboot**: Manual `docker start rocm`. Not auto-started by default.
+- **Powered-off droplet still billing**: Power-off does **not** stop billing. Only **Destroy** does. Snapshot first if you want to keep the state.
+---
+## Cross-references
+- [`docs/HUGGINGFACE_DEPLOYMENT.md`](HUGGINGFACE_DEPLOYMENT.md) — how the Streamlit Space talks to this vLLM endpoint
+- [`docs/ARCHITECTURE.md`](ARCHITECTURE.md) — how the application uses the vLLM endpoint via the provider abstraction
+- [`docs/AMD_DEPLOY_LESSONS_LEARNED.md`](AMD_DEPLOY_LESSONS_LEARNED.md) — extended history of every push iteration, error message, and workaround we hit

docs/ARCHITECTURE.md ADDED Viewed

	@@ -0,0 +1,229 @@

+# PaperHawk Architecture
+How PaperHawk is built and why each piece is where it is. This document explains the multi-graph LangGraph orchestration, the 14 deterministic domain checks, the 6-layer anti-hallucination stack, and the multi-agent DD assistant.
+---
+## High-level architecture
+```
+┌──────────────────────────────────────────────────────────────────────────┐
+│                          USER (Streamlit 5-tab UI)                       │
+│   Upload  │  Results  │  Chat  │  DD Assistant  │  Report                │
+└────────────────────────────────┬─────────────────────────────────────────┘
+                                 │
+            ┌────────────────────┼────────────────────────┐
+            │                    │                        │
+            ▼                    ▼                        ▼
+   ┌──────────────────┐ ┌──────────────────┐  ┌─────────────────────────┐
+   │  pipeline_graph  │ │   chat_graph     │  │    dd_graph             │
+   │                  │ │                  │  │                         │
+   │ Ingest →         │ │ Intent classify  │  │ Contract filter →       │
+   │ Classify →       │ │ → Plan →         │  │ Per-contract summary →  │
+   │ Extract →        │ │ Agent (5 tools)  │  │ Multi-agent specialists │
+   │ Compare →        │ │ → Synthesizer →  │  │ (audit/legal/compliance │
+   │ Risk →           │ │ Validator        │  │  /financial) →          │
+   │ Report           │ │ ([Source: …])    │  │ Supervisor → Synthesizer│
+   └──────────────────┘ └──────────────────┘  └─────────────────────────┘
+            │                                        │
+            └─────────────┬──────────────────────────┘
+                          ▼
+                ┌──────────────────────────┐
+                │  package_insights_graph  │
+                │                          │
+                │  Cross-document analysis │
+                │  (price-drift, dupes,    │
+                │   three-way matching)    │
+                └──────────────────────────┘
+                          │
+                          ▼
+                ┌──────────────────────────┐
+                │    Provider abstraction  │
+                │ (configurable_alternatives)
+                │                          │
+                │ vLLM ←→ Ollama ←→ Dummy  │
+                └──────────────────────────┘
+                          │
+                          ▼
+                ┌──────────────────────────┐
+                │  AMD MI300X (vLLM)       │
+                │  Qwen 2.5 14B Instruct   │
+                │  192 GB HBM3, ROCm 7.0   │
+                └──────────────────────────┘
+```
+---
+## Compiled graphs (4)
+Every entry-point in the system is a separately compiled LangGraph artifact with its own typed state and `AsyncSqliteSaver` checkpointer:
+### 1. `pipeline_graph` — the document processing pipeline
+The 6-step end-to-end flow when the user uploads a package:
+1. **Ingest** — PDF (PyMuPDF + pdfplumber for table extraction), DOCX (native), images (vision-first via the LLM), with Tesseract OCR fallback for scanned PDFs (EN/HU/DE)
+2. **Classify** — 6-way doc-type classifier with structured output (`invoice`, `delivery_note`, `purchase_order`, `contract`, `financial_report`, `other`); ISA 500 evidence-quality score
+3. **Extract** — per doc-type Pydantic v2 schema with `_quotes` and `_confidence` fields; universal fallback schema for unknown types
+4. **Compare** — three-way matching subgraph (invoice + delivery note + PO), duplicate-invoice detection (ISA 240)
+5. **Risk** — basic plausibility + 14 domain checks (Send-API parallel fan-out) + LLM risk ensemble + 3-stage filter chain
+6. **Report** — DOCX export, JSON output, Streamlit UI rendering
+State: `PipelineState` (Pydantic), with reducers for risk lists and per-document results.
+### 2. `chat_graph` — the agentic chat
+5-tool ReAct agent with strict citation enforcement:
+- **Tools**: `list_documents`, `get_extraction`, `search_documents` (hybrid Chroma + BM25 with Reciprocal Rank Fusion), `compare_documents`, `validate_document`
+- **Prompt**: 17-rule system prompt enforcing `[Source: filename.pdf]` format
+- **Validator node**: post-processor that drops any answer without citations
+- **Intent classifier**: routes to direct-answer vs tool-use paths to keep latency low for casual queries
+State: `ChatState` with message history, retrieved chunks, and citation list.
+### 3. `dd_graph` — the multi-agent DD assistant
+For M&A due-diligence packages:
+- **Contract filter** — selects only contract-type documents from the package
+- **Per-contract summary** — extracts each contract's key terms (parties, term, value, change-of-control, non-compete, auto-renewal)
+- **4 specialist agents** (run in parallel via Send-API):
+  - `audit_specialist` — material misstatement risk, ISA 240 fraud indicators
+  - `legal_specialist` — change-of-control, non-compete, automatic-renewal red flags
+  - `compliance_specialist` — GDPR Art. 28 sub-processor language, AML counterparty checks
+  - `financial_specialist` — Ptk. 6:98 disproportionate penalty clauses, materiality thresholds
+- **Supervisor** — coordinates specialists, drops business-normal noise
+- **Synthesizer** — writes 3-paragraph executive summary
+State: `DDState` with contract list, per-contract summaries, specialist findings, executive summary.
+### 4. `package_insights_graph` — cross-document analysis
+Package-level analyzers that don't fit into the per-document pipeline:
+- **Pricing-drift detector** — flags > 30% price changes for the same line item across invoices in a package (caught the 57.5% drift in our live demo)
+- **Duplicate-invoice detector** — exact + near-match (date within 13 days, amount within 1%)
+- **Counterparty consistency** — same supplier name spelled differently across documents
+State: `PackageState` with per-document extractions and aggregated findings.
+---
+## Subgraphs (6)
+Reusable LangGraph subgraphs imported by the main graphs:
+| Subgraph | Purpose |
+|---|---|
+| `extract_subgraph` | Per-document extraction with quote validator |
+| `ingest_subgraph` | PDF/DOCX/image loading with OCR fallback |
+| `llm_risk_subgraph` | LLM risk generation with structured output |
+| `rag_index_subgraph` | Chunking, embedding, ChromaDB indexing |
+| `rag_query_subgraph` | Hybrid Chroma + BM25 retrieval with RRF |
+| `risk_subgraph` | Domain check fan-out + LLM risk + 3-stage filters |
+---
+## 14 deterministic domain checks
+The check registry (`domain_checks/__init__.py`) is the heart of PaperHawk's auditor-grade output. Every check is a Python `Protocol` implementation, not an LLM prompt — they cannot hallucinate, can be unit-tested, and produce defensible findings with explicit regulation sources.
+### A-tier (essential)
+1. **Mandatory invoice elements** (HU VAT Act §169) — 18 required elements per invoice
+2. **Tax-ID checksum** (Art. 22 §) — mod-11 Hungarian tax-ID validation
+3. **Contract completeness** (Ptk. Book 6) — termination, governing law, penalty, confidentiality clauses
+4. **Disproportionality** (Ptk. 6:98) — penalty clause > 31.7% of contract value flagged HIGH
+5. **Rounded amounts** (ISA 240) — > 14.7% rounded amounts flagged suspicious, > 24.3% flagged HIGH
+6. **Evidence hierarchy** (ISA 500) — document-type reliability score (8/10 invoice, 7/10 contract)
+### B-tier (supplementary)
+7. **Materiality** (ISA 320) — 1.93% of document value as info-level threshold
+8. **GDPR Article 28** — 10 mandatory sub-processor language elements + PII detection
+9. **DD red flags** (M&A) — change-of-control, non-compete, automatic-renewal triggers
+### C-tier (informational)
+10. **Incoterms 2020** — 11 incoterm rules detected via regex word-boundaries
+11. **IFRS/HAR anomaly** — goodwill amortization flag, operational lease in IFRS context
+12. **Duplicate invoice** (ISA 240) — exact + near-match with 13-day date filter
+13. **AML sanctions** (Pmt.) — static EU/OFAC snapshot with fuzzy name match
+14. **Contract dates** — start-end consistency, expiry detection
+**Jurisdiction-aware**: Hungarian-specific rules (HU VAT Act, Ptk., Art.) apply only to Hungarian documents. Universal rules (ISA, GDPR, Incoterms, AML) apply everywhere.
+---
+## 6-layer anti-hallucination stack
+The system is designed so the LLM **cannot** lie about a document and have the lie pass through.
+| Layer | What it does |
+|---|---|
+| 1. `temperature=0` | Deterministic outputs every run |
+| 2. Source quote requirement | Every extraction must include a verbatim quote from the source PDF in `_quotes` |
+| 3. Confidence scoring | high / medium / low per extracted field, surfaced to the user |
+| 4. Plausibility validators | Deterministic Python checks for math, dates, totals, item-level VAT, currency normalization |
+| 5. 3-stage LLM-risk filter chain | Drops business-normal noise, drops repeats of basic deterministic checks, drops contradictions |
+| 6. Quote validator | Text-search the source PDF for the claimed quote; downgrade confidence if not found verbatim, drop entirely if obviously fabricated |
+In our live audit demo, layer 6 caught **4 of 6** hallucinated citations from Qwen 2.5 14B and downgraded them to `low` confidence.
+The `validation/` package is one of the most-edited folders in the repo precisely because we treat anti-hallucination as a first-class concern, not a guardrail layer slapped on top.
+---
+## Provider abstraction
+`configurable_alternatives` lets us swap LLM backends with a single env var:
+| `LLM_PROFILE` | Backend | Use case |
+|---|---|---|
+| `vllm` | vLLM REST endpoint (OpenAI-compatible) | Production on AMD MI300X |
+| `ollama` | Local Ollama at `localhost:11434` | Dev on consumer GPU |
+| `dummy` | Deterministic stub | CI tests, smoke tests, judge quick-demo |
+The application code never imports an LLM SDK directly — all calls go through `providers/` factory functions with `configurable_alternatives`. Switching from Anthropic Claude (our original dev target) to Qwen on vLLM required **zero application code changes** — only env vars.
+---
+## Embedding + retrieval
+- **Model**: BAAI/bge-m3 (1024-dim, multilingual EN/HU/DE/FR via sentence-transformers)
+- **Storage**: ChromaDB persistent (per-session) + BM25 in-memory keyword index
+- **Hybrid retrieval**: Reciprocal Rank Fusion of Chroma top-K and BM25 top-K
+- **Chunking**: Natural-boundary chunking (paragraph-aware, ~500 tokens with overlap)
+The embedding model loads once at app startup (~2.3 GB to RAM/VRAM). On first run it downloads from Hugging Face Hub to `~/.cache/huggingface/`.
+---
+## State persistence
+- **Per-session**: Streamlit `session_state` for UI state (uploaded files, current package)
+- **Per-graph**: `AsyncSqliteSaver` checkpointer at `data/checkpoints.sqlite` for LangGraph state
+- **Vector store**: ChromaDB at `chroma_db/` (gitignored)
+Restarting the app loads the last checkpoint, so chat history and extraction results survive a restart.
+---
+## Streamlit UI (5 tabs)
+1. **Upload** — drag-and-drop (PDF, DOCX, PNG, JPG, TXT), 200 MB per file, plus 3 pre-bundled demo packages
+2. **Results** — classification confidence, extracted data, risks per document, package-level cross-doc analysis
+3. **Chat** — agentic chat with `[Source: filename.pdf]` citations
+4. **DD Assistant** — for M&A packages: per-contract summaries + 4 specialist findings + executive summary + downloadable DOCX
+5. **Report** — JSON output + DOCX export
+The async runtime uses a long-lived background event loop (`app/async_runtime.py`) so the UI stays responsive during multi-minute pipeline runs.
+---
+## Cross-references
+- [`docs/AMD_DEPLOYMENT.md`](AMD_DEPLOYMENT.md) — how the production vLLM endpoint runs on AMD MI300X
+- [`docs/HUGGINGFACE_DEPLOYMENT.md`](HUGGINGFACE_DEPLOYMENT.md) — how the Streamlit app deploys as a public HF Space
+- [`docs/SUBMISSION.md`](SUBMISSION.md) — full hackathon submission brief with TAM/SAM, competitor positioning, live deployment validation

docs/HF_SPACE_DEFAULT_GETTING_STARTED.md DELETED Viewed

@@ -1,193 +0,0 @@
-# HF Space Default Getting Started — Snapshot 2026-05-05
-A `lablab-ai-amd-developer-hackathon/paperhawk` Space létrehozása után a HF Spaces egy default "Get Started" útmutatót mutat. Ezt mentjük el itt referenciaként, mert a default Dockerfile-mintája hasznos referencia a paperhawk Dockerfile átírásához (port 8501 → 7860, user-setup pattern).
-**Forrás**: a Space oldal alján, a default-README után jelent meg.
-**URL**: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk
-**Kontextus**: a Space frissen létrehozva, Docker SDK + Blank template + `Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X` short description.
----
-## Get started with your Docker Space!
-Your space has been created, follow these steps to get started (or read the full [documentation](https://huggingface.co/docs/hub/spaces-sdks-docker))
-### Start by cloning this repo by using:
-**HTTPS:**
-```bash
-git clone https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk
-```
-**SSH:**
-```bash
-git clone git@hf.co:spaces/lablab-ai-amd-developer-hackathon/paperhawk
-```
-### Make sure you're CLI v2.x.x or above:
-```bash
-curl -LsSf https://hf.co/cli/install.sh | sh
-```
-### Download the Space:
-```bash
-hf download lablab-ai-amd-developer-hackathon/paperhawk --repo-type=space
-```
----
-## Let's create a simple Python app using FastAPI
-### `requirements.txt`
-```
-fastapi
-uvicorn[standard]
-```
-> **Hint:** You can also create the requirements file directly in your browser.
-### `app.py`
-```python
-from fastapi import FastAPI
-app = FastAPI()
-@app.get("/")
-def greet_json():
-    return {"Hello": "World!"}
-```
-> **Hint:** You can also create the app file directly in your browser.
----
-## Create your Dockerfile
-```dockerfile
-# Read the doc: https://huggingface.co/docs/hub/spaces-sdks-docker
-# you will also find guides on how best to write your Dockerfile
-FROM python:3.9
-RUN useradd -m -u 1000 user
-USER user
-ENV PATH="/home/user/.local/bin:$PATH"
-WORKDIR /app
-COPY --chown=user ./requirements.txt requirements.txt
-RUN pip install --no-cache-dir --upgrade -r requirements.txt
-COPY --chown=user . /app
-CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
-```
-> **Hint:** Alternatively, you can create the Dockerfile file directly in your browser.
----
-## Then commit and push
-```bash
-git add requirements.txt app.py Dockerfile
-git commit -m "Add application file"
-git push
-```
-> Finally, your Space should be running on this page after a few moments!
----
-## App port
-> Your Docker Space needs to listen on port `7860`.
-## Personalize your Space
-Make your Space stand out by customizing its emoji, colors, and description by **editing metadata** in its `README.md` file.
-## Documentation
-Read the full documentation for Docker Spaces [here](https://huggingface.co/docs/hub/spaces-sdks-docker).
----
-## Mit jelent ez nekünk (paperhawk-specifikus megjegyzések)
-### A default Dockerfile vs a paperhawk Dockerfile
-A paperhawk meglévő Dockerfile-ja **fejlettebb** mint a default-példa:
-| Aspektus | HF default | Paperhawk |
-|---|---|---|
-| Python version | `python:3.9` | `python:3.12-slim` (modernebb) |
-| User setup | `useradd -m -u 1000 user` + `USER user` (non-root, security best-practice) | NINCS (root user) |
-| OS-deps | nincs | `tesseract-ocr` + `poppler-utils` + `libmupdf-dev` (PDF + OCR) |
-| Pre-download | nincs | `BAAI/bge-m3` 2.27 GB (build-time) |
-| App | `uvicorn` FastAPI | `streamlit` |
-| Port | **`7860`** | **`8501`** → **átírva 7860-ra a HF Space-nek** (2026-05-05) |
-### A 2 fő átírás amit a paperhawk Dockerfile-on csinálni kellett
-1. **Port-átállítás 8501 → 7860** (kész, 2026-05-05):
-   - `EXPOSE 8501` → `EXPOSE 7860`
-   - `--server.port=8501` → `--server.port=7860`
-   - `HEALTHCHECK ... http://localhost:8501/_stcore/health` → `http://localhost:7860/_stcore/health`
-2. **(opcionális) User-setup hozzáadása** security best-practice szempontból:
-   - `RUN useradd -m -u 1000 user`
-   - `USER user`
-   - `ENV PATH="/home/user/.local/bin:$PATH"`
-   - `COPY --chown=user ...`
-   - **A HF Spaces NEM követeli kötelező módon**, és a paperhawk-stack root-ként is jól fut.
-### A README.md front-matter
-A HF Spaces megköveteli a `README.md` tetején egy YAML front-matter-t. A paperhawk `README.md` tetejére beillesztve (2026-05-05):
-```yaml
----
-title: PaperHawk
-emoji: 🦅
-colorFrom: red
-colorTo: orange
-sdk: docker
-pinned: false
-license: mit
-short_description: Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X
----
-```
-A meglévő paperhawk `README.md`-tartalom (project README) ezután következik. A front-matter csak a HF Space-nek szól, GitHub-on is renderelhető (a YAML-t code-block-ként mutatja).
-### A clone + push workflow a paperhawk-on
-A meglévő paperhawk GitHub-repón (`nandorfivince/paperhawk`) hozzáadunk egy új remote-ot:
-```bash
-cd ~/development/<host-paperhawk-path>
-git remote add space https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk
-git push space main
-```
-A push első futáskor authenticálni kér — a HF Hub-token-t kéri, amit a Vincsipe accountból lehet generálni a https://huggingface.co/settings/tokens-en (új Token, "Write" scope).
-### App port környezeti változó
-A HF Spaces a `7860`-as portot várja default. A paperhawk `streamlit` parancs ki van egészítve a `--server.port=7860` flag-gel a `Dockerfile`-ben (2026-05-05).
-### HF Spaces hardware
-CPU Basic = free tier, 16 GB RAM, 2 vCPU. Bőven elég a paperhawk-Streamlit-jéhez (~3-5 GB RAM-fogyasztás bge-m3 + ChromaDB + Streamlit). A vLLM az AMD MI300X-en fut **külön**, a Space `VLLM_BASE_URL` Secret-en keresztül hivatkozik rá.
-### Sleep mode
-A free Space 48 órás inaktivitás után alvó-módba kerül. Az első request a felébredés után 30-60 sec. A bíráskodás alatt érdemes **periodikusan** pingelni a Space-t (pl. UptimeRobot 30 perces intervallum), vagy a Build-in-Public posztokon megosztani hogy organic-traffic-al ébren tartsuk.

docs/HUGGINGFACE_DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,251 @@

+# Hugging Face Spaces Deployment
+How we deployed the PaperHawk Streamlit application as a public Hugging Face Space, with the AMD MI300X vLLM endpoint as its inference backend.
+---
+## What you get
+- **Public Space URL** — a Streamlit app anyone can use in a browser, no signup
+- **Free CPU Basic tier** — 16 GB RAM, 2 vCPU. The app runs here; the LLM runs on AMD MI300X via vLLM (separate Cloud).
+- **Two paths**: under the `lablab-ai-amd-developer-hackathon` org (Plan A — qualifies for HF Special Prize), or under your personal account (Plan B — fallback if the org has hardware-quota issues)
+Live example: <https://huggingface.co/spaces/Vincsipe/paperhawk>
+---
+## Prerequisites
+1. Hugging Face account (free)
+2. **Optional**: membership in the `lablab-ai-amd-developer-hackathon` org if submitting to the AMD Developer Hackathon (Plan A). The HF Special Prize requires the Space to live under this org.
+3. A running vLLM endpoint on AMD MI300X — see [`AMD_DEPLOYMENT.md`](AMD_DEPLOYMENT.md)
+4. The PaperHawk repo cloned locally with `Dockerfile`, `README.md`, and `app/main.py`
+---
+## Step 1 — Create the Space
+Go to <https://huggingface.co/new-space> (or, if you're an org member, click `+ New` → `New Space` from the org page).
+**Configuration**:
+| Field | Value |
+|---|---|
+| Owner | `lablab-ai-amd-developer-hackathon` (Plan A) or your personal handle (Plan B) |
+| Space name | `paperhawk` |
+| Short description | `Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X` |
+| License | `mit` |
+| **Space SDK** | **Docker** (not Streamlit, not Gradio — see step 2) |
+| **Template** | **Blank** (we ship our own Dockerfile) |
+| Hardware | CPU Basic (free, 16 GB RAM) |
+| Visibility | Public (required for the HF Special Prize) |
+Click **Create Space**. You'll get an empty repo at:
+```
+https://huggingface.co/spaces/<owner>/paperhawk
+```
+**Why Docker SDK and not Streamlit-template?** As of 2026, the HF Spaces "Streamlit" SDK lives under the Docker tab as a managed template. We bypass the template because PaperHawk needs custom OS dependencies (Tesseract OCR for EN/HU/DE, poppler-utils for table extraction, libmupdf for PDFs) that the templated builder doesn't include. Our own Dockerfile is faster to debug and gives us a deterministic base image.
+---
+## Step 2 — Configure the Dockerfile for HF Spaces
+The PaperHawk Dockerfile is HF-Spaces-ready out of the box, with one critical detail: **port 7860**.
+```dockerfile
+# syntax=docker/dockerfile:1.6
+FROM python:3.12-slim AS base
+ENV PYTHONUNBUFFERED=1 PYTHONDONTWRITEBYTECODE=1
+# OS deps
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    tesseract-ocr tesseract-ocr-eng tesseract-ocr-hun tesseract-ocr-deu \
+    poppler-utils libmupdf-dev curl \
+ && rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --upgrade pip \
+ && pip install --index-url https://download.pytorch.org/whl/cpu torch \
+ && pip install -r requirements.txt
+# Pre-download the embedding model so the first user request isn't slow
+RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-m3')"
+COPY . .
+# HF Spaces expects port 7860 (NOT Streamlit's default 8501)
+EXPOSE 7860
+CMD ["streamlit", "run", "app/main.py", \
+     "--server.address=0.0.0.0", \
+     "--server.port=7860", \
+     "--server.headless=true"]
+```
+**Why 7860?** HF Spaces' Docker hosting only routes traffic to port 7860 — the Streamlit default 8501 is invisible to the public URL. This is a one-line fix that's easy to miss.
+---
+## Step 3 — Configure the README YAML front-matter
+HF Spaces reads the YAML block at the top of `README.md` to configure the Space card and build behavior. PaperHawk's:
+```yaml
+---
+title: PaperHawk
+emoji: 🦅
+colorFrom: red
+colorTo: yellow
+sdk: docker
+pinned: false
+license: mit
+short_description: Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X
+---
+```
+**Critical**: `colorTo` must be one of `[red, yellow, green, blue, indigo, purple, pink, gray]`. We initially used `orange` (because the AMD brand color is orange) — HF rejected the YAML as invalid, and the Space card fell back to a generic theme **with the YAML rendered as a Markdown table at the top of the page**. Fixed by changing to `yellow`.
+If the Space's main page shows a `title | PaperHawk` table at the top, the YAML is invalid and HF can't parse it — check the `colorTo` value first.
+---
+## Step 4 — Set up Git LFS for binary assets
+HF Spaces has a strict rule: every binary file (`*.png`, `*.pdf`, `*.pptx`, `*.docx`, `*.jpg`, `*.mp4`) must live in **Xet storage** via Git LFS, not as a regular Git blob. The cover PNG, the slide PDF, the demo packages — all of these get rejected without LFS.
+On your local machine:
+```bash
+# One-time, in any repo with binary files
+sudo apt install git-lfs   # or `brew install git-lfs` on macOS
+git lfs install
+```
+In the PaperHawk repo:
+```bash
+git lfs track "*.png" "*.pdf" "*.pptx" "*.docx" "*.jpeg" "*.jpg" "*.mp4"
+git add .gitattributes
+git commit -m "Track binary files via LFS"
+```
+**Important**: `git lfs track` only updates `.gitattributes`. Existing commits with binaries-as-Git-blob are still rejected by HF. Migrate the entire history:
+```bash
+git lfs migrate import --include="*.png,*.pdf,*.pptx,*.docx,*.jpeg,*.jpg,*.mp4"
+```
+This rewrites the HEAD commit so the binaries are LFS-blobs. New `git push` will upload them via Xet.
+**Files over 10 MB**: HF Spaces also enforces a 10 MB hard limit per file even via LFS for the free Spaces tier. Any single video over 10 MB will be rejected. If you have demo videos, keep them as separate uploads on YouTube/Vimeo and link from the Space description.
+---
+## Step 5 — Add the Space as a git remote and push
+```bash
+# Add a remote for the Space (token embedded in URL avoids dual auth-prompts)
+HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxx   # generate at https://huggingface.co/settings/tokens (Write scope, fine-grained, with org access if Plan A)
+git remote add space https://<your-hf-username>:${HF_TOKEN}@huggingface.co/spaces/<owner>/paperhawk
+# Push to the Space
+git push --force space main
+```
+**Why token in URL?** Git LFS uses a separate authentication channel from the regular Git push. Without the token in the URL, Git prompts for credentials twice and one of them silently times out. Putting the token in the URL handles both.
+The first push uploads ~9 MB of LFS objects (the cover image, slide PDF, sample PDFs, sample DOCX). Subsequent pushes are fast (cached on HF's side).
+---
+## Step 6 — Add Space secrets
+The app reads its LLM provider config from environment variables. In the Space:
+**Settings** (top-right, on the Space page) → **Variables and secrets** → **+ New variable** for each:
+| Key | Value | Type |
+|---|---|---|
+| `LLM_PROFILE` | `vllm` | Variable |
+| `VLLM_BASE_URL` | `http://<MI300X_DROPLET_IP>:8000/v1` | Variable |
+| `VLLM_MODEL` | `Qwen/Qwen2.5-14B-Instruct` | Variable |
+| `EMBEDDING_MODEL` | `BAAI/bge-m3` | Variable |
+| `VLLM_API_KEY` | `sk-paperhawk-2026` (the same token you passed to vLLM `--api-key`) | **Secret** |
+The `VLLM_API_KEY` must be a **Secret**, not a Variable — Secrets are masked in the UI and not exposed via the public Space metadata.
+After saving, the Space rebuilds automatically (~5 minutes for first build, faster for subsequent).
+---
+## Step 7 — Wait for the build, then verify
+The first build pulls and installs everything — Python 3.12-slim, OS deps, PyTorch CPU wheel, the BAAI/bge-m3 model (~2.3 GB pre-download), and the rest of `requirements.txt`. Expect 8–15 minutes for the cold build.
+Watch the build logs in the Space → **Logs** tab. When you see `streamlit run app/main.py` and `You can now view your Streamlit app in your browser` the Space is up.
+Open the Space URL in a browser and click **Audit Demo**. If the vLLM endpoint is reachable, you'll see results in 20–25 seconds.
+If you get an error like `Connection refused` or a long hang, check:
+1. The MI300X droplet is running and `vllm serve` is up (SSH in, look at the SSH window from `AMD_DEPLOYMENT.md` step 6)
+2. The droplet's UFW has port 8000 open (`ufw status | grep 8000` from the droplet)
+3. The `VLLM_BASE_URL` in Space Secrets matches the droplet's current public IP (which changes on every recreate-from-snapshot)
+---
+## Step 8 — Hide the YAML from the GitHub display (optional)
+The YAML front-matter is needed for HF Spaces but **looks ugly on GitHub** — the renderer shows it as a `key | value` table at the top of the README, with no formatting.
+Workaround: GitHub honors `.github/README.md` over the root `README.md` for the public repo display. We commit a copy of the README **without** the YAML block as `.github/README.md`:
+```bash
+mkdir -p .github
+tail -n +12 README.md > .github/README.md   # skip the first 11 lines (the YAML + blank line)
+# (optionally edit .github/README.md to use absolute raw-image URLs for paperhawk.jpeg)
+git add .github/README.md
+git commit -m "Add .github/README.md to hide HF YAML on GitHub display"
+git push origin main
+```
+Now GitHub shows `.github/README.md` (clean), and HF Spaces still reads the root `README.md` (with YAML). One file, two faces.
+---
+## Plan A vs Plan B
+| Aspect | Plan A (org Space) | Plan B (personal Space) |
+|---|---|---|
+| Owner | `lablab-ai-amd-developer-hackathon/paperhawk` | `<your-handle>/paperhawk` |
+| HF Special Prize | ✅ Qualifies | ❌ Disqualifies |
+| Org-quota dependency | ⚠️ Yes (shared with other org Spaces) | ❌ Independent |
+| Visibility | Public, on the org page | Public, on your profile |
+| Setup steps | Same as above | Same as above |
+If the org-quota is exhausted (we hit `null quota limit` 403 errors), the same code, same Dockerfile, same YAML, same env-var setup pushes to a personal Space and runs immediately. This was our Plan B safety net during the hackathon.
+---
+## Common pitfalls
+- **"Build failed: app port 7860 not reachable"**: Your Dockerfile is binding to a different port (probably Streamlit's default 8501). Change `EXPOSE` and `CMD` to use 7860.
+- **YAML rendered as a Markdown table on the Space main page**: The YAML is invalid. Most likely culprits: invalid `colorTo` (allowed: red/yellow/green/blue/indigo/purple/pink/gray, **not** orange), invalid `sdk`, missing `---` opening line, BOM/whitespace before the first `---`.
+- **"binary files require Xet"**: You haven't run `git lfs track` + `git lfs migrate import` yet. The HF push rejects committed binaries that aren't LFS-blobs.
+- **"Files larger than 10 MiB are not allowed"**: A single file is over 10 MB even after LFS. Move it out of the repo and link from the README.
+- **"null quota limit" 403 error**: Org-level hardware quota is exhausted. Wait for capacity, ping a lablab admin in Discord, or push to a personal Space (Plan B).
+- **App loads but "Connection refused" on Audit Demo**: The vLLM endpoint is down or the IP changed. SSH into the droplet and confirm `vllm serve` is running. Update `VLLM_BASE_URL` Secret if the IP rotated.
+- **App loads but "401 Unauthorized" on every LLM call**: The `VLLM_API_KEY` Secret doesn't match the `--api-key` you passed to `vllm serve`. They have to be byte-for-byte identical.
+---
+## Cross-references
+- [`docs/AMD_DEPLOYMENT.md`](AMD_DEPLOYMENT.md) — provisioning the AMD MI300X vLLM endpoint that this Space depends on
+- [`docs/ARCHITECTURE.md`](ARCHITECTURE.md) — how the Streamlit app, the LangGraph multi-graph orchestrator, and the vLLM endpoint fit together
+- [`docs/HF_SPACE_DEFAULT_GETTING_STARTED.md`](HF_SPACE_DEFAULT_GETTING_STARTED.md) — the canonical HF Spaces Quick Start that this guide builds on
+- [`docs/SUBMISSION.md`](SUBMISSION.md) — full hackathon submission brief

docs/SUBMISSION.md CHANGED Viewed

@@ -19,7 +19,39 @@
 ---
-## Long Description
 ### The Problem
@@ -147,23 +179,94 @@ One codebase, one MIT license, three prize pools.
 | Project Title | DONE | `PaperHawk` |
 | Short Description | DONE | 247 characters, A+C blend |
 | Long Description | DONE | 10 sections, builder-energy tone |
-| Cover Image | DONE | `paperhawk.jpeg` (2048 × 819 px) |
 | Technology & Category Tags | DONE | 12 tags |
 | Public GitHub Repository | DONE | `github.com/nandorfivince/paperhawk` |
-| Video Presentation | TODO | Demo walkthrough video |
-| Slide Presentation | TODO | 5–8 slide deck |
-| Demo Application URL | TODO | HF Space public URL |
-| HF Space URL | TODO | Under `lablab-ai-amd-developer-hackathon` org |
 ---
 ## Submission URLs (filled at submission time)
 - **GitHub repo**: https://github.com/nandorfivince/paperhawk
-- **Hugging Face Space**: *(to be added)*
-- **Demo video**: *(to be added)*
-- **Slide deck**: *(to be added)*
-- **Live application URL**: *(same as HF Space URL)*
 ---

 ---
+## Long Description (Submission Form — 600-2000 char limit, copy-paste-ready)
+> **Use this version when filing the lablab.ai Submission Form Long Description field.** Compact, all key points covered (problem, solution, target audience, USP, performance, market, future), exactly within the 600-2000 character envelope. Char count: **~1880**.
+```
+The Problem
+Audit, legal due diligence, tax compliance, and M&A rely on humans reading dozens of documents looking for errors and red flags. A senior auditor needs ~8 hours per 50-page package. ChatGPT/Copilot/Harvey handle one document at a time, hallucinate citations, and lack jurisdiction-specific compliance knowledge.
+Our Solution: PaperHawk
+PaperHawk is an agentic multi-document intelligence platform processing 3-50 PDFs simultaneously, detecting cross-document inconsistencies humans miss. It combines:
+- 14 deterministic statutory rules (HU VAT Act §169, ISA 240/320/500, GDPR Art. 28, AML, Ptk. 6:98, Art. 22) hand-coded in Python
+- 6-layer anti-hallucination stack (temperature=0, source quotes, confidence scores, plausibility, LLM-risk filters, quote validator)
+- Multi-agent LangGraph orchestration (4 graphs + 6 subgraphs, 5-tool agentic chat)
+- Cross-document red flag detection (e.g. 57.5% price drift across 3 invoices auto-detected)
+Target Audience
+Auditors, lawyers, tax advisors, DD analysts, compliance officers, CFOs, forensic accountants, banking risk teams. EU + Hungarian focus initially.
+Why We Win (vs Harvey, ChatPwC, OWL, Copilot)
+These tools handle ONE document well. We handle MANY together — three-way matching, cross-doc consistency, package-level red flags. Plus jurisdiction-specific compliance rules hard-coded, not prompt-engineered. Open-source MIT, self-hostable on AMD MI300X.
+Performance
+23.3 sec for 3-document audit (61.7x faster than manual). Qwen 2.5 14B Instruct on AMD MI300X via vLLM (307 t/s prompt, 252 t/s generation, 30.4% prefix cache hit rate).
+Market & Future
+EU professional services market ~$280B TAM, document workflows ~$45B SAM, HU/CEE audit beachhead ~$2B SOM. Roadmap: NAV eAFA integration, fraud detection (Benford's Law), partner risk scoring, human-in-the-loop M2M validation. SaaS revenue ($500-2k/seat/month) + on-prem enterprise for Big Four.
+```
+---
+## Extended Reference Material — Long Description Source (NOT for Submission Form)
+> The 10-section detailed write-up below is the **source material** for the demo video voiceover, the slide deck (`docs/slides/PaperHawk_Slides.pdf`), and the technical walkthrough README. **Do not paste this into the Submission Form** — it would exceed the 2000-char limit several times over. Keep it here as the canonical "what we built" reference.
 ### The Problem
 | Project Title | DONE | `PaperHawk` |
 | Short Description | DONE | 247 characters, A+C blend |
 | Long Description | DONE | 10 sections, builder-energy tone |
+| Cover Image | DONE | `docs/slides/01_cover.png` (1280 × 720, 16:9) |
+| Slide Presentation | DONE | `docs/slides/PaperHawk_Slides.pdf` (10 slides) |
 | Technology & Category Tags | DONE | 12 tags |
 | Public GitHub Repository | DONE | `github.com/nandorfivince/paperhawk` |
+| Live HF Space — `Vincsipe/paperhawk` (Plan-B) | DONE | Validated end-to-end 2026-05-05 |
+| Live HF Space — `lablab-ai-amd-developer-hackathon/paperhawk` (Plan-A) | BLOCKED | Org-quota issue, ticket pending |
+| Build-in-Public Posts | TODO at posting time | 4 drafts ready in `docs/social-posts/` |
+| Video Presentation | TODO | Demo walkthrough video (max 3 min) |
+| AMD Developer Experience Feedback | DONE | See section below |
+---
+## Live Deployment Validation (2026-05-05)
+End-to-end live test of the full stack succeeded on **2026-05-05 reggel** with the following measured results:
+| Metric | Value |
+|---|---|
+| Audit Demo processing time (3 PDFs) | **23.3 seconds** |
+| Speedup vs manual auditor (24 min estimate) | **61.7×** |
+| vLLM cold-start from snapshot (HF cache preserved) | **~30 seconds** (vs 70 sec clean install) |
+| Prompt throughput | **307 tokens/sec** |
+| Generation throughput | **252 tokens/sec** |
+| Prefix cache hit rate | **30.4%** |
+| Cross-document red flag detected | **57.5% price drift** (78,740 → 124,016 Ft over 3 invoices) |
+| Anti-hallucination quote validator | Caught 4 of 6 hallucinated citations, downgraded confidence |
+| Jurisdictional standards applied | HU VAT Act §169, ISA 500, ISA 320 |
+The full pipeline ran from a publicly-deployed Hugging Face Space (`Vincsipe/paperhawk`) through to the AMD MI300X vLLM endpoint and back, with all 14 deterministic domain checks executing and the package-level cross-doc analyzer correctly identifying the price-drift red flag without human prompting.
+**Recorded outputs**: 4 win-screenshots (`Screenshot from 2026-05-05 10-07-{15,22,31,37}.png`) usable in the Submission video and slides.
+---
+## AMD Developer Experience Feedback
+Our team had a generally positive experience deploying our agentic document intelligence platform on AMD's stack. Key feedback by component:
+### ROCm 7.0
+The vLLM 0.17.1 + ROCm 7.0 build was stable out of the box on the Quick Start image. Qwen 2.5 14B Instruct loaded in 17.4 sec to MI300X VRAM (27.6 GB model + 141 GB available KV cache), CUDA graph compilation took 20.5 sec, total cold-start ~70 sec. Production-grade throughput: 307 tokens/sec prompt, 252 tokens/sec generation, 30.4% prefix cache hit rate. The OpenAI-compatible REST endpoint at port 8000 worked transparently. We did not need any ROCm-specific code changes from our development setup — vLLM abstracted everything. **Recommendation**: keep the Quick Start vLLM image fresh; it saved us hours of setup.
+### AMD Developer Cloud (DigitalOcean-powered)
+**Strengths**:
+- $1.99/hour MI300X pricing is fair and predictable
+- The Quick Start vLLM image saved hours of setup (Docker + ROCm + vLLM pre-installed, JupyterLab launched on port 80)
+- 192 GB HBM3 + 141 GB available KV cache — lots of headroom for large-context multi-agent workloads
+- Snapshot-and-destroy workflow excellent for cost control: $0.32/day storage for ~96 GB snapshot, 5-10 min recreate from snapshot, HF model cache preserved inside the Docker container layer means warm restart is ~30 sec instead of cold-start 70 sec
+- Auto-destroy on credit runout (when no payment method) is a built-in safety net we appreciated
+- Free $100 promo credit makes the platform genuinely accessible to hackathon participants
+**Pain points and UI improvement opportunities**:
+1. Sidebar `GPU Droplets` link in the left navigation routes to the CPU Droplet flow (a clear UI bug — workaround is the homepage `Create a GPU Droplet` card or the top-right `Create` dropdown). We hit this twice in our first hour.
+2. Default region NYC1 was 'out of capacity' for MI300X plan — we had to switch to ATL1 via URL parameter (`?region=atl1`). The region selector on the GPU Droplet creation page does not appear to be exposed in the UI; we found the workaround by inspecting the URL of a successful creation. Adding region availability indicators on the GPU Plan selector would help.
+3. Reboot after `apt-get upgrade` (recommended via Security notice) does not auto-restart the `rocm` Docker container — needed `docker start rocm` manually. Worth documenting in the Quick Start onboarding.
+### AMD APIs
+We did not use the lower-level ROCm-API or AMD-specific SDKs directly. Our stack was vLLM + OpenAI-compatible REST → all hardware-specific work was abstracted away through standard Python tooling. This is actually a strength: we ran a production-grade paperhawk pipeline (originally developed against Anthropic Claude API) on AMD MI300X with **zero application code changes** — proving the AMD stack via vLLM is a real drop-in alternative for production AI workloads. We changed only environment variables (`LLM_PROFILE`, `VLLM_BASE_URL`, `VLLM_API_KEY`, `VLLM_MODEL`).
+### Overall verdict
+AMD MI300X via the Developer Cloud is a viable production deployment platform for agentic LLM applications. The Quick Start vLLM image is a major time-saver. The few UI bugs and capacity-region issues are minor compared to the platform's strengths. The combination of $1.99/hour MI300X pricing + snapshot-restore workflow + OpenAI-compatible vLLM endpoint makes this a credible alternative to AWS p4d/p5 or GCP A3 for inference workloads, especially at the price point.
 ---
 ## Submission URLs (filled at submission time)
+### Plan-A (lablab-org admin reagált) — preferred
 - **GitHub repo**: https://github.com/nandorfivince/paperhawk
+- **Hugging Face Space (official)**: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk
+- **Live application URL**: same as HF Space URL above
+- **Slide deck**: `docs/slides/PaperHawk_Slides.pdf`
+- **Demo video**: *(uploaded at submission time)*
+### Plan-B (lablab-org quota unresolved) — fallback
+- **GitHub repo**: https://github.com/nandorfivince/paperhawk
+- **Hugging Face Space (working, parallel)**: https://huggingface.co/spaces/Vincsipe/paperhawk
+- **Live application URL**: same as HF Space URL above
+- **Slide deck**: `docs/slides/PaperHawk_Slides.pdf`
+- **Demo video**: *(uploaded at submission time)*
+**Plan-B trade-off**: HF Special Prize (Reachy Mini robot + HF PRO + $500 credits) requires the Space to be under the `lablab-ai-amd-developer-hackathon` org. If we ship under `Vincsipe/paperhawk`, we forfeit the HF Special Prize but retain qualification for the four main judging criteria (Presentation, Business Value, Application of Technology, Originality).
 ---

docs/hf-space-deployment.md DELETED Viewed

@@ -1,124 +0,0 @@
-# Hugging Face Space deployment
-The Streamlit app deploys to a **Hugging Face Space** under the
-`lablab-ai-amd-developer-hackathon` organization. This is **mandatory** for
-the Hugging Face Special Prize and convenient as the public demo URL.
-## 1. Prerequisites
-- Hugging Face account
-- Membership in the **AMD Developer Hackathon** HF organization
-  ([join here](https://huggingface.co/login?next=%2Forganizations%2Flablab-ai-amd-developer-hackathon%2Fshare%2FELARrxoRIHvseSHRhANJYFEZQazsQIYhJf))
-- A running vLLM endpoint on the AMD MI300X (see `qwen-vllm-deployment.md`)
-## 2. Create the Space
-1. Hugging Face → Spaces → New Space
-2. Owner: `lablab-ai-amd-developer-hackathon`
-3. Space name: `paperhawk`
-4. License: MIT
-5. SDK: **Streamlit**
-6. Hardware: **CPU basic** (free) — vLLM runs on MI300X, the Space only hosts the UI
-## 3. Push the code
-```bash
-git remote add space https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk
-git push space main
-```
-The Space auto-builds from the repo using `requirements.txt` and runs
-`app.py` (or, in our layout, configures Streamlit to start `app/main.py`).
-## 4. Set Space env vars
-In the Space → Settings → Variables and secrets, add:
-```
-LLM_PROFILE=vllm
-VLLM_BASE_URL=http://<mi300x-public-ip>:8000/v1
-VLLM_MODEL=Qwen/Qwen2.5-14B-Instruct
-VLLM_API_KEY=<the api key you set on the vLLM server>
-EMBEDDING_MODEL=BAAI/bge-m3
-```
-Mark `VLLM_API_KEY` as a **secret** (not a regular variable).
-## 5. Space front-matter
-Edit the `README.md` to start with the HF Spaces front-matter:
-```yaml
----
-title: Document Intelligence (AMD Edition)
-emoji: 🔍
-colorFrom: red
-colorTo: yellow
-sdk: streamlit
-sdk_version: 1.40.0
-app_file: app/main.py
-pinned: false
-license: mit
-short_description: Multi-document due diligence with LangGraph + Qwen on AMD MI300X
-tags:
-  - langgraph
-  - agentic
-  - rag
-  - qwen
-  - amd
-  - document-intelligence
----
-```
-(The current README.md is the project README; this front-matter goes on top
-when the repo is mirrored to the HF Space.)
-## 6. Verify the Space
-After the build finishes (~3-5 minutes):
-1. Open `https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk`
-2. Click the **Audit Demo** button → it should run end-to-end and produce
-   risks + a report.
-3. Open the **Chat** tab → ask a question → the answer should include
-   `[Source: filename.pdf]` citations.
-## 7. Resource tier
-The free CPU basic tier (16 GB RAM, 2 vCPU) handles:
-- BGE-m3 embedding (~2.3 GB on first load)
-- ChromaDB (small index)
-- Streamlit UI
-The vLLM model runs on the MI300X, **not** here. The Space just renders the
-UI and proxies requests to the vLLM endpoint.
-If the free tier is too tight on memory, upgrade to **CPU upgrade** ($0.03/h).
-## 8. Sleep mode mitigation
-A free Space sleeps after 48 hours of inactivity. The first request after
-sleep takes ~30-60 seconds to wake. Mitigations:
-- Share the Space link in your Build-in-Public posts → continuous traffic →
-  less likely to sleep.
-- Set up a 30-minute external ping (e.g. UptimeRobot) the day before
-  judging.
-## 9. The HF Special Prize is like-driven
-Once the Space is live:
-1. Share the URL on X / LinkedIn (tag `@lablab` and `@AIatAMD`).
-2. Ask your followers to like the Space.
-3. The Space with the most likes at the end of the hackathon wins:
-   - 1st: Reachy Mini Wireless robot + 6 months HF PRO + $500 HF credit
-   - 2nd: 3 months HF PRO + $300 credit
-   - 3rd: 2 months HF PRO + $200 credit
-## 10. Submission to lablab
-When submitting on lablab.ai, paste the Space URL into the **Application
-URL** and **Hugging Face Space link** fields. This is mandatory for the HF
-prize qualification.

docs/qwen-vllm-deployment.md DELETED Viewed

@@ -1,68 +0,0 @@
-# Qwen on AMD MI300X — vLLM deployment
-This guide covers the production deployment path: running Qwen 2.5 Instruct
-(14B or 32B) via [vLLM](https://github.com/vllm-project/vllm) on an
-**AMD Instinct MI300X** through the AMD Developer Cloud, with the Streamlit
-app calling the vLLM endpoint over the OpenAI-compatible REST API.
-For the canonical step-by-step (including the docker run command and a
-benchmark table), see [`infra/vllm/README.md`](../infra/vllm/README.md).
-## Why this stack?
-- **Open source LLM** — Qwen 2.5 is Apache-2 licensed; safe for the MIT
-  open-source license here, and a partner-prize bonus on the hackathon.
-- **Multilingual** — Qwen 2.5 handles HU/DE/EN well, which matters for our
-  multilingual demo data.
-- **AMD-native** — vLLM has a ROCm build (`rocm/vllm:latest`) optimized for
-  the MI300X. No CUDA, no NVIDIA dependency.
-- **OpenAI-compatible API** — `langchain-openai`'s `ChatOpenAI` adapter
-  works out of the box with a custom `base_url`. Tool-calling, structured
-  output, and streaming all behave the same as the public OpenAI endpoint.
-- **No vendor lock-in** — the same code runs against Ollama (locally) and
-  against any OpenAI-compatible inference server.
-## Cost monitoring
-AMD Developer Cloud pricing (May 2026 ballpark):
-- ~$4-8/hour pay-as-you-go for an MI300X instance.
-- Each team member gets `$100` in cloud credits → 60 hours of MI300X uptime
-  at $5/h. With 3 team members, ~180 hours total.
-**Discipline:**
-1. Only run during demo / test / build sessions; **stop the instance when
-   idle**.
-2. Keep one teammate's credit untouched as a final-day buffer.
-3. Run end-to-end smoke tests early — a hot fix on deadline day burns hours
-   you can't get back.
-## Plan B: Ollama fallback
-If the AMD credit doesn't arrive in time, or the MI300X has a network issue
-on demo day:
-```bash
-LLM_PROFILE=ollama OLLAMA_MODEL=qwen2.5:7b-instruct streamlit run app/main.py
-```
-Pull the model first:
-```bash
-ollama pull qwen2.5:7b-instruct
-```
-Quality drops (7B vs 14B/32B), but the demo flow stays alive on a laptop
-GPU or even CPU.
-## Production hardening (post-hackathon)
-For an actual production deployment beyond the hackathon scope:
-- TLS termination (Caddy / Nginx in front of vLLM)
-- API-key rotation (`--api-key` flag with a periodic rotation script)
-- Prometheus + Grafana on vLLM `/metrics`
-- `--quantization fp8` to fit a larger model on smaller hardware
-- `--enable-prefix-caching` for repeated long system prompts
-- Multi-GPU / multi-region scaling via SkyPilot or vLLM Production Stack

docs/social-posts/post-1-build-window-opens.md DELETED Viewed

@@ -1,165 +0,0 @@
-# Build in Public · Post 1 — Build Window Opens
-**Timing**: post on or just after the AMD Hackathon kick-off (May 4, 6:00 PM CEST).
-**Order**: post on **X first**, then LinkedIn ~30 minutes later.
-**Why**: X moves fast, LinkedIn rewards a slightly longer-form follow-up.
-This is the first of three planned Build-in-Public posts:
-1. **Post 1** (this file) — build window opens · stack-introduction · GitHub link
-2. **Post 2** (mid-week, ~May 7-8) — technical deep-dive on one design choice (LangGraph Send-API parallelism for the deterministic check fan-out)
-3. **Post 3** (May 10, after submit) — final demo · HF Space · pitch-recap
-Mandatory tags ([per the official Build in Public requirement](https://lablab.ai/event/amd-developer-hackathon)):
-| Platform | Required tags |
-|---|---|
-| X | `@lablab` + `@AIatAMD` |
-| LinkedIn | `lablab.ai` + `AMD Developer` (showcase pages) |
----
-## Variant A — X (Twitter)
-> Character budget: 280 — version below uses 269 chars including handles + hashtags.
-```
-Build window opens.
-Putting our LangGraph-native, multi-agent document intelligence
-platform on AMD Instinct MI300X for the @AIatAMD x @lablab
-hackathon.
-Qwen 2.5 14B on vLLM. 14 deterministic domain checks. 5+1
-anti-halluc layers. MIT, public.
-→ github.com/nandorfivince/paperhawk
-#AMDHackathon #BuildInPublic
-```
-### X variant alternatives (in case the first doesn't fit)
-**Punchy / 240 char:**
-```
-PaperHawk — multi-agent document intelligence on @AIatAMD MI300X.
-Qwen 2.5 14B + LangGraph 0.6 + 14 deterministic domain checks.
-Build window starts now for the @lablab hackathon.
-Open source · MIT · public repo.
-→ github.com/nandorfivince/paperhawk
-#AMDHackathon #BuildInPublic
-```
-**Tech-detail / 270 char:**
-```
-We built PaperHawk: 4 LangGraph graphs, 6 subgraphs, 14
-deterministic domain checks, multi-agent DD assistant.
-Now porting it to @AIatAMD Instinct MI300X via vLLM for the
-@lablab hackathon.
-Qwen 2.5 14B inside. MIT, public.
-→ github.com/nandorfivince/paperhawk
-#AMDHackathon #BuildInPublic
-```
----
-## Variant B — LinkedIn (long form)
-> Character budget: 3000. Version below is ~1280 chars + tags. Reads as a proper builder-energy update for technical recruiters and AI-engineering peers.
-```
-Build window opens.
-For the next week we're putting PaperHawk — our LangGraph-native,
-multi-agent document intelligence platform — on AMD Instinct MI300X
-GPUs for the AMD Developer Hackathon × lablab.ai.
-The premise is simple: most "document AI" today is RAG with extra
-steps. Retrieve a passage, summarize it, hope it's right. That's
-fine for FAQ chatbots. It's not fine for auditors, due-diligence
-teams, or anyone who has to cross-reference a folder of contracts
-and invoices and trust the answer.
-PaperHawk is built for the second case:
-→ 4 compiled LangGraph 0.6 graphs (pipeline / chat / DD / package)
-→ 14 deterministic domain checks (ISA 240/500/320, GDPR Article 28,
-   Incoterms 2020, AML sanctions)
-→ 5+1 anti-hallucination layers — every LLM claim must cite a
-   verbatim quote from the document, or it gets dropped
-→ 5-tool agentic chat with strict [Source: filename.pdf] citations
-→ Multi-agent DD assistant: 4 specialists + supervisor + synthesizer
-Stack:
-→ Qwen 2.5 14B Instruct served via vLLM on AMD MI300X (ROCm)
-→ BAAI/bge-m3 multilingual embeddings
-→ Streamlit 5-tab UI, deployable as a Hugging Face Space
-→ MIT licensed, English-first, multilingual fallback
-Three of us have shipped together for nearly a decade. We're not
-new to building things. We're using this hackathon to put our
-agentic DI platform on AMD's open compute stack and see how far it
-goes.
-We'll be sharing a technical walkthrough mid-week — including why
-LangGraph's Send-API parallelism beat sequential domain dispatch in
-our benchmarks.
-Repo (public): https://github.com/nandorfivince/paperhawk
-#AMDHackathon #BuildInPublic #LangGraph #Qwen #AMDInstinct #lablab
-```
-**Don't forget**: in the LinkedIn post composer, **tag the company pages**:
-- `lablab.ai` → https://www.linkedin.com/company/lablab-ai/
-- `AMD Developer` (showcase page) → https://www.linkedin.com/showcase/amd-developer/
-These appear as `@lablab.ai` and `@AMD Developer` in the post — LinkedIn auto-completes them when you start typing.
----
-## Image / media to attach
-For both X and LinkedIn, attach **one image**: the cover slide from the deck.
-```bash
-# Generate it from slides.html (see docs/slides/README.md for the script):
-python -c "<<see docs/slides/README.md cover-PNG snippet>>"
-# Output: docs/slides/01_cover.png
-```
-Alternative for X (which compresses heavily): use the `paperhawk.jpeg` directly — it's already wide-format (2048×819) and reads well on mobile.
----
-## Posting checklist
-| Step | Status |
-|---|---|
-| Cover image generated (`docs/slides/01_cover.png`) | TODO before posting |
-| GitHub repo public + README hero visible | DONE |
-| `@lablab` + `@AIatAMD` typed correctly on X | TODO at post-time |
-| `lablab.ai` + `AMD Developer` company pages tagged on LinkedIn | TODO at post-time |
-| Repo URL works in private/incognito browser (sanity-check public visibility) | TODO before posting |
-| `#AMDHackathon` `#BuildInPublic` hashtags both included | DONE |
----
-## What this post is NOT
-- Not a marketing pitch. It's a technical announcement.
-- Not "we hope to win". It's "we built this, here's what it does, watch this space."
-- Not asking for likes. The HF Space is where like-voting happens (different track / different prize).
-The job of this post: **plant a flag**. We're building. We're public. We've shipped together before. Now we're doing it on AMD GPUs.