Nándorfi Vince commited on
Commit
3385e0e
·
1 Parent(s): 67b464c

Sync documentation overhaul from main (markdown only, LFS history preserved)

Browse files
README.md CHANGED
@@ -10,159 +10,152 @@ short_description: Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X
10
  ---
11
 
12
  <p align="center">
13
- <img src="paperhawk.jpeg" alt="PaperHawk" width="900">
14
  </p>
15
 
16
  <h1 align="center">PaperHawk</h1>
17
 
18
  <p align="center">
19
  <strong>Agentic document intelligence on AMD MI300X</strong><br>
20
- Multi-document due diligence with deterministic domain checks and agentic LLM workflows.
21
  </p>
22
 
23
  <p align="center">
24
- <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
25
  <img src="https://img.shields.io/badge/python-3.12+-blue.svg" alt="Python">
26
  <img src="https://img.shields.io/badge/LangGraph-0.6-green.svg" alt="LangGraph">
27
  <img src="https://img.shields.io/badge/AMD-MI300X-red.svg" alt="AMD MI300X">
 
28
  </p>
29
 
30
  <p align="center">
31
- Built for the <a href="https://lablab.ai/event/amd-developer-hackathon"><strong>AMD Developer Hackathon × lablab.ai</strong></a> (May 2026).
32
  </p>
33
 
34
  ---
35
 
36
- ## What is this?
37
 
38
- A working AI system that ingests multiple business documents (invoices,
39
- contracts, delivery notes, purchase orders, financial reports) and:
40
 
41
- - **Extracts structured data** with anti-hallucination layers (5+1 stack)
42
- - **Detects risks** via 14 deterministic domain rules + LLM ensemble
43
- - **Cross-references documents** (three-way matching for audits, M&A DD)
44
- - **Answers questions** via 5-tool agentic chat with source citations
45
- - **Generates audit-ready reports** (DOCX export, JSON API)
46
 
47
- This is **not "just another RAG"** — it is a multi-agent orchestration of
48
- specialist nodes (audit / legal / compliance / financial) over a deterministic
49
- + LLM ensemble, with explicit anti-hallucination layers.
50
 
51
- ## Stack
52
 
53
- | Layer | Technology |
54
- |-------|------------|
55
- | Orchestration | **LangGraph 0.6** (4 graphs, 6 subgraphs, async-first, AsyncSqliteSaver) |
56
- | LLM | **Qwen 2.5 14B Instruct** via vLLM on **AMD Instinct MI300X** |
57
- | Embedding | **BAAI/bge-m3** (multilingual, 1024 dim, sentence-transformers) |
58
- | Vector store | **ChromaDB + BM25** hybrid (Reciprocal Rank Fusion) |
59
- | UI | **Streamlit** (5 tabs) — deployable as a **Hugging Face Space** |
60
- | Testing | pytest + Playwright |
61
 
62
- ## Architecture
63
 
64
- ```
65
- ┌─────────────────────────────────┐
66
- │ Streamlit UI (5 tabs) │
67
- └────────────┬────────────────────┘
68
-
69
- ┌────────────────────────┼────────────────────────┐
70
- │ │ │
71
- ┌───────▼──────┐ ┌────────▼────────┐ ┌──────▼──────┐
72
- pipeline │ │ chat_graph │ │ dd_graph │
73
- │ _graph │ │ (5 tools, 17 │ │ (multi- │
74
- (6 subgraphs)│ │ rule prompt) │ │ agent │
75
- └───────┬──────┘ └─────────────────┘ │ super-
76
- │ │ visor)
77
- │ ┌───────��─────────────────┐ └─────────────┘
78
- ├──▶ ingest_subgraph │
79
- ├──▶ classify (per-doc) │
80
- ├──▶ extract_subgraph │
81
- ├──▶ rag_index_subgraph │
82
- ├──▶ compare_node (3-way) │
83
- └──▶ risk_subgraph │
84
- ├─ basic risk │
85
- ├─ 14 domain checks │
86
- ├─ LLM risk + 3 filters │
87
- ├─ plausibility │
88
- └─ duplicate (ISA 240) │
89
- ```
90
 
91
- See [ARCHITECTURE.md](ARCHITECTURE.md) for the full architecture.
92
 
93
- ## Quick start
94
 
95
- ### 1. Local dev (Ollama or dummy mode)
 
 
 
 
 
 
 
 
 
 
 
 
96
 
97
  ```bash
98
- git clone https://github.com/<YOUR_GH_USER>/document-intelligence-agentic-langgraph-amd
99
- cd document-intelligence-agentic-langgraph-amd
100
- python -m venv .venv && source .venv/bin/activate
101
- pip install -r requirements.txt
102
- cp .env.example .env
103
- # Edit .env: set LLM_PROFILE=dummy (no LLM) or LLM_PROFILE=ollama (Qwen 7B local)
104
-
105
- streamlit run app/main.py
106
  ```
107
 
108
- ### 2. Production (Qwen on AMD MI300X via vLLM)
 
 
 
 
109
 
110
  ```bash
111
- # On the AMD Developer Cloud MI300X instance:
112
- docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video \
113
- --ipc=host --shm-size 16g \
114
- -p 8000:8000 \
115
- -e VLLM_MODEL=Qwen/Qwen2.5-14B-Instruct \
116
- rocm/vllm:latest \
117
- sh -c 'vllm serve $VLLM_MODEL --host 0.0.0.0 --port 8000 \
118
- --tensor-parallel-size 1 --max-model-len 32768'
119
-
120
- # On your machine (.env):
121
- LLM_PROFILE=vllm
122
- VLLM_BASE_URL=http://<mi300x-public-ip>:8000/v1
123
- VLLM_MODEL=Qwen/Qwen2.5-14B-Instruct
124
-
125
- streamlit run app/main.py
126
  ```
127
 
128
- See [docs/qwen-vllm-deployment.md](docs/qwen-vllm-deployment.md) for the full
129
- walkthrough including cost monitoring and a Plan B (Ollama fallback).
 
130
 
131
- ### 3. Hugging Face Space deploy
 
 
 
132
 
133
- See [docs/hf-space-deployment.md](docs/hf-space-deployment.md).
134
 
135
- ## Demo packages
136
 
137
- Three pre-built demo packages bundled in `test_data/`:
 
 
138
 
139
- - **Audit Demo** 3 invoices from the same supplier; the March one is 50%
140
- pricier (over-billing pattern detected by the package-level analyzer).
141
- - **DD Demo** — NDA + service agreement + amendment in an acquisition
142
- scenario (change-of-control + auto-renewal red flags).
143
- - **Compliance Demo** — 2 contracts; one is missing the GDPR Article 28 clause.
144
 
145
- Click the corresponding button on the **Upload** tab.
146
 
147
  ## Documentation
148
 
149
- - [ARCHITECTURE.md](ARCHITECTURE.md) architecture overview (English)
150
- - [docs/qwen-vllm-deployment.md](docs/qwen-vllm-deployment.md) — Qwen on AMD MI300X (English)
151
- - [docs/hf-space-deployment.md](docs/hf-space-deployment.md) Hugging Face Space deploy (English)
152
- - [docs/LANGGRAPH_ONBOARDING.md](docs/LANGGRAPH_ONBOARDING.md) onboarding for contributors (English)
153
- - [CLAUDE.md](CLAUDE.md) project-level Claude Code instructions
154
- - [NOTICE.md](NOTICE.md) — author intent (non-binding)
155
- - `docs/Teljes-rendszer-attekintes-langgraph_HU.md` — legacy Hungarian system overview (reference)
156
- - `docs/MUKODESI_LEIRAS_HU.md` — legacy Hungarian operations manual (reference)
157
 
158
- ## Built by
159
 
160
- **Team CsimpiCsirkek** for the AMD Developer Hackathon × lablab.ai (2026):
161
-
162
- - Nándorfi Vince
163
- - Vitai Tamás
164
- - Murcsik Gábor
165
 
166
  ## License
167
 
168
- **MIT** — see [LICENSE](LICENSE).
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
  <p align="center">
13
+ <img src="https://raw.githubusercontent.com/nandorfivince/paperhawk/main/paperhawk.jpeg" alt="PaperHawk" width="900">
14
  </p>
15
 
16
  <h1 align="center">PaperHawk</h1>
17
 
18
  <p align="center">
19
  <strong>Agentic document intelligence on AMD MI300X</strong><br>
20
+ Multi-document due diligence with deterministic compliance rules and a 6-layer anti-hallucination stack.
21
  </p>
22
 
23
  <p align="center">
24
+ <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT">
25
  <img src="https://img.shields.io/badge/python-3.12+-blue.svg" alt="Python">
26
  <img src="https://img.shields.io/badge/LangGraph-0.6-green.svg" alt="LangGraph">
27
  <img src="https://img.shields.io/badge/AMD-MI300X-red.svg" alt="AMD MI300X">
28
+ <img src="https://img.shields.io/badge/Qwen-2.5%2014B-purple.svg" alt="Qwen 2.5 14B">
29
  </p>
30
 
31
  <p align="center">
32
+ Built for the <strong>AMD Developer Hackathon × lablab.ai</strong> (May 2026).
33
  </p>
34
 
35
  ---
36
 
37
+ ## What is PaperHawk?
38
 
39
+ PaperHawk is an **agentic multi-document intelligence platform** for auditors, lawyers, tax advisors, and DD analysts. It processes 3–50 PDFs simultaneously and detects **cross-document red flags humans miss** — like a 57.5% price drift across three invoices from the same supplier — using a multi-agent LangGraph orchestration on top of Qwen 2.5 14B Instruct served via vLLM on AMD Instinct MI300X.
 
40
 
41
+ It is **not** a chatbot. It is a typed-state, multi-graph reasoning system with deterministic compliance rules, verbatim source citations, and a quote validator that catches LLM hallucinations before they reach the user.
 
 
 
 
42
 
43
+ ## Why it matters
 
 
44
 
45
+ A senior auditor needs ~8 hours to thoroughly review a 50-page invoice/contract package. ChatGPT, Copilot, and Harvey handle one document at a time, hallucinate citations, and lack jurisdiction-specific compliance knowledge. PaperHawk handles the entire package, applies 14 statutory rules hand-coded in Python, and finishes a 3-document audit in **23.3 seconds** (61.7× faster than manual review) — with auditor-grade citations and ISA/GDPR/HU-VAT mappings.
46
 
47
+ ---
 
 
 
 
 
 
 
48
 
49
+ ## Technical highlights
50
 
51
+ - **Multi-agent LangGraph 0.6 orchestration** — 4 compiled graphs (pipeline, chat, DD, package_insights) + 6 reusable subgraphs with Send-API parallelism
52
+ - **5-tool agentic chat** with strict `[Source: filename.pdf]` citations validated by a post-processor (no provenance → no answer)
53
+ - **6-layer anti-hallucination stack** — `temperature=0`, verbatim source quotes, field-level confidence, plausibility validators, 3-stage LLM-risk filter chain, quote validator
54
+ - **Provider abstraction** with `configurable_alternatives` — vLLM (production) / Ollama (local dev) / dummy (CI) — swap with one env var, zero code changes
55
+ - **AMD Instinct MI300X via vLLM** — 192 GB HBM3, 27.6 GB model + 141 GB available KV cache, 307 t/s prompt + 252 t/s generation, 30.4% prefix cache hit rate
56
+ - **61.7× speedup** vs manual audit on a 3-document package (23.3 sec vs ~24 min)
57
+ - **Hugging Face Space deployable** with Docker SDK + Git LFS for binary assets
58
+
59
+ ## Domain highlights
60
+
61
+ - **14 deterministic statutory rules** hand-coded in Python (NOT prompt-engineered) — ISA 240/320/500 audit standards, HU VAT Act §169 mandatory invoice elements, Ptk. 6:98 disproportionate penalty clauses, Art. 22 tax-ID validation, GDPR Article 28 sub-processor language, Incoterms 2020, AML sanctions list (EU/OFAC fuzzy match)
62
+ - **Cross-document red flag detection** — three-way matching (invoice + delivery note + PO), package-level pricing anomalies, duplicate-invoice detection (ISA 240), change-of-control trigger detection (M&A DD)
63
+ - **Multi-agent DD assistant** — 4 specialists (audit / legal / compliance / financial) coordinated by a supervisor and a synthesizer for executive summaries
64
+ - **Auditor-grade citations** — every finding maps to a regulation source (HU VAT Act §169, ISA 500, GDPR Art. 28, etc.) with verbatim source quote
65
+ - **Multilingual ingest** — EN / HU / DE OCR via Tesseract, native PDF + DOCX, vision-first scanned-PDF fallback
66
+
67
+ ---
68
+
69
+ ## Try the live demo
 
 
 
 
 
 
 
70
 
71
+ **Public Hugging Face Space** (no signup, runs in browser):
72
 
73
+ <https://huggingface.co/spaces/Vincsipe/paperhawk>
74
 
75
+ Click **Audit Demo** in the Quick demo section. Three pre-bundled invoices process in ~25 seconds and you'll see the cross-doc 57.5% price drift flag, the 14 deterministic checks, and the auditor-grade citations.
76
+
77
+ Backed by an AMD MI300X vLLM endpoint serving Qwen 2.5 14B Instruct.
78
+
79
+ ---
80
+
81
+ ## Run it locally
82
+
83
+ Two options depending on whether you have a GPU or just want a quick smoke test.
84
+
85
+ ### Quick demo (~3 minutes, no GPU needed)
86
+
87
+ Uses the **deterministic dummy provider** — runs the full pipeline, all 14 domain checks, and the multi-agent orchestration without any LLM calls. Good for verifying the system runs end-to-end.
88
 
89
  ```bash
90
+ git clone https://github.com/nandorfivince/paperhawk
91
+ cd paperhawk
92
+ make install
93
+ LLM_PROFILE=dummy make dev
 
 
 
 
94
  ```
95
 
96
+ Open <http://localhost:8501> → **Audit Demo** button. Result in ~5 seconds (dummy provider returns deterministic test data).
97
+
98
+ ### Full demo (~10 minutes, ~16 GB VRAM recommended)
99
+
100
+ Uses **Ollama with Qwen 2.5 14B Instruct** (the same model we deployed to AMD MI300X via vLLM). On a consumer GPU like NVIDIA RTX 4090 / RTX PRO 4500 (32 GB VRAM) you'll see real, production-grade multi-agent reasoning.
101
 
102
  ```bash
103
+ git clone https://github.com/nandorfivince/paperhawk
104
+ cd paperhawk
105
+ make install
106
+
107
+ # Pull the model (one-time, ~9 GB download)
108
+ ollama pull qwen2.5:14b-instruct
109
+
110
+ # Run the app pointed at Ollama
111
+ LLM_PROFILE=ollama OLLAMA_MODEL=qwen2.5:14b-instruct \
112
+ streamlit run app/main.py --server.port=8501 --server.fileWatcherType=none
 
 
 
 
 
113
  ```
114
 
115
+ Open <http://localhost:8501> **Audit Demo** button.
116
+
117
+ **Expected results on an RTX PRO 4500 (32 GB GDDR7)**:
118
 
119
+ - Audit Demo: ~80 seconds for 3 invoices, 17.5× speedup vs manual
120
+ - 8 risk findings (2 HIGH, 4 MEDIUM, 2 LOW), HU VAT Act §169 mappings
121
+ - Cross-doc package-level analyzer flags the 57.5% price-drift red flag
122
+ - Quote validator catches 4 of 6 hallucinated citations and downgrades them to `low` confidence
123
 
124
+ (On AMD MI300X via vLLM: ~23 seconds, 61.7× speedup. 5× faster than Ollama on consumer GPU.)
125
 
126
+ ### Docker compose (alternative)
127
 
128
+ ```bash
129
+ make run-local
130
+ ```
131
 
132
+ Spins up the Streamlit app + Ollama in containers. First run pulls the model (~9 GB).
 
 
 
 
133
 
134
+ ---
135
 
136
  ## Documentation
137
 
138
+ | Document | What it covers |
139
+ |---|---|
140
+ | [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) | LangGraph multi-graph design, 14 domain checks, anti-hallucination stack, multi-agent DD |
141
+ | [`docs/AMD_DEPLOYMENT.md`](docs/AMD_DEPLOYMENT.md) | How we deployed Qwen 2.5 14B via vLLM on AMD Instinct MI300X (DigitalOcean-powered AMD Developer Cloud) |
142
+ | [`docs/HUGGINGFACE_DEPLOYMENT.md`](docs/HUGGINGFACE_DEPLOYMENT.md) | How we deployed the Streamlit app as a public Hugging Face Space |
 
 
 
143
 
144
+ For the full submission brief with TAM/SAM, competitor analysis, and the live deployment validation results, see [`docs/SUBMISSION.md`](docs/SUBMISSION.md).
145
 
146
+ ---
 
 
 
 
147
 
148
  ## License
149
 
150
+ MIT — see [`LICENSE`](LICENSE). Use, fork, deploy commercially or non-commercially.
151
+
152
+ ## Built by
153
+
154
+ **Team csimpicsirkek** (`PÁKÁK the AI warriors!` on the lablab.ai platform):
155
+
156
+ - Vince Nándorfi — lead, LangGraph architecture, AMD adaptation
157
+ - Erika Nagy — silent partner
158
+ - Tamás Vitai
159
+ - Gábor Murcsik
160
+
161
+ For the AMD Developer Hackathon × lablab.ai, May 2026.
docs/AMD_DEPLOYMENT.md ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AMD MI300X Deployment
2
+
3
+ How we deployed Qwen 2.5 14B Instruct via vLLM on AMD Instinct MI300X using the AMD Developer Cloud (DigitalOcean-powered). End-to-end, with copy-paste commands and the costs we actually paid.
4
+
5
+ ---
6
+
7
+ ## What you get
8
+
9
+ - **AMD Instinct MI300X** — 192 GB HBM3 GPU, 20 vCPU, 240 GB RAM, 720 GB NVMe boot disk
10
+ - **vLLM 0.17.1 + ROCm 7.0** — pre-installed via the Quick Start image
11
+ - **OpenAI-compatible REST endpoint** at `http://<droplet-ip>:8000/v1`
12
+ - **Cost**: $1.99 / GPU / hour. Free $100 credit covers ~50 hours.
13
+
14
+ ---
15
+
16
+ ## Prerequisites
17
+
18
+ 1. **AMD AI Developer Program signup** — <https://www.amd.com/en/developer/ai-dev-program.html>
19
+ - Approval takes 1–2 business days; you receive a $100 cloud credit by email automatically
20
+ 2. **lablab.ai event Enroll** (for hackathon participants) — <https://lablab.ai/event/amd-developer-hackathon>
21
+ 3. **SSH key on your local machine** (we recommend a dedicated key, not your default GitHub key — see step 1 below)
22
+
23
+ ---
24
+
25
+ ## Step 1 — Generate a dedicated SSH key
26
+
27
+ The default `~/.ssh/id_ed25519` is often passphrase-protected and routed through a GNOME-keyring agent that interferes with non-interactive `ssh-add`. Sidestep it with a passphrase-less, dedicated key:
28
+
29
+ ```bash
30
+ ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_amd_paperhawk -N "" -C "you@paperhawk-amd"
31
+ cat ~/.ssh/id_ed25519_amd_paperhawk.pub
32
+ ```
33
+
34
+ Copy the public key to clipboard for the next step.
35
+
36
+ ---
37
+
38
+ ## Step 2 — Create a GPU Droplet
39
+
40
+ Go to <https://cloud.amd.com/> (or <https://amd.digitalocean.com/>) and click **Create a GPU Droplet** on the homepage card.
41
+
42
+ **Caution**: the left-sidebar `GPU Droplets` link routes to the CPU Droplet flow as of May 2026 (a UI bug). Use the homepage card or the top-right `Create ▼` dropdown.
43
+
44
+ ### Configuration
45
+
46
+ - **GPU Plan**: AMD MI300X (single-GPU, $1.99/hr) — **not** the 8-GPU variant
47
+ - **Region**: ATL1 (Atlanta) — NYC1 is often "out of capacity" for MI300X. If the Plan card is greyed out, the URL parameter `?region=atl1` switches you over.
48
+ - **Image**: Quick Start → vLLM (0.17.1, ROCm 7.0) — comes with Docker, JupyterLab, and a pre-built `rocm` container
49
+ - **SSH Key**: Add a new key, paste the public key from step 1, name it `paperhawk-amd-deploy`
50
+ - **Visibility**: doesn't matter; the droplet is private to your account
51
+
52
+ Click **Create GPU Droplet**. It takes 5–10 minutes to come up. Once `Active`, note the Public IPv4 address.
53
+
54
+ ---
55
+
56
+ ## Step 3 — SSH in
57
+
58
+ ```bash
59
+ ssh -i ~/.ssh/id_ed25519_amd_paperhawk -o IdentityAgent=none root@<DROPLET_IP>
60
+ ```
61
+
62
+ The `-o IdentityAgent=none` flag bypasses the GNOME-keyring SSH agent if it's misbehaving on your local machine.
63
+
64
+ You'll see a welcome banner with two key facts:
65
+
66
+ ```
67
+ Access the Jupyter Server: http://<IP>:80 (we don't use this)
68
+ docker exec -it rocm /bin/bash (we DO use this)
69
+ ```
70
+
71
+ ---
72
+
73
+ ## Step 4 — Open port 8000 in the firewall
74
+
75
+ The Quick Start image ships with UFW enabled, allowing only SSH (22), HTTP (80), and HTTPS (443). vLLM runs on 8000, so we need to open it:
76
+
77
+ ```bash
78
+ ufw allow 8000
79
+ ufw status | grep 8000
80
+ ```
81
+
82
+ You should see `8000 ALLOW Anywhere` and the IPv6 equivalent.
83
+
84
+ The `--api-key` flag we pass to vLLM in step 6 prevents anyone scanning the public internet from using your endpoint — opening port 8000 is safe with API-key auth.
85
+
86
+ ---
87
+
88
+ ## Step 5 — (Optional) System upgrade and reboot
89
+
90
+ The Quick Start image ships with ~120 outdated packages including security updates. Recommended before snapshotting:
91
+
92
+ ```bash
93
+ apt-get update && DEBIAN_FRONTEND=noninteractive apt-get upgrade -y
94
+ reboot
95
+ ```
96
+
97
+ Wait ~1.5–2 minutes, then SSH in again. **The `rocm` Docker container does not auto-restart after the reboot**, so:
98
+
99
+ ```bash
100
+ docker start rocm
101
+ docker ps # confirm `rocm` is Up
102
+ ```
103
+
104
+ ---
105
+
106
+ ## Step 6 — Start vLLM serving Qwen 2.5 14B
107
+
108
+ Enter the Docker container:
109
+
110
+ ```bash
111
+ docker exec -it rocm /bin/bash
112
+ ```
113
+
114
+ Run vLLM in one long line (line continuations with `\` sometimes break under paste — single-line is most reliable):
115
+
116
+ ```bash
117
+ vllm serve Qwen/Qwen2.5-14B-Instruct --api-key sk-paperhawk-2026 --port 8000 --host 0.0.0.0 --enable-auto-tool-choice --tool-call-parser hermes --trust-remote-code
118
+ ```
119
+
120
+ What this does:
121
+
122
+ | Flag | Why |
123
+ |---|---|
124
+ | `Qwen/Qwen2.5-14B-Instruct` | Model ID on Hugging Face Hub. vLLM auto-downloads on first run (~28 GB, ~6 sec from ATL DC) |
125
+ | `--api-key sk-paperhawk-2026` | Bearer token required by every request. Anti-misuse for the public-internet endpoint. |
126
+ | `--port 8000` | OpenAI-compat REST at `:8000/v1` |
127
+ | `--host 0.0.0.0` | Bind on all interfaces so the public IP is reachable |
128
+ | `--enable-auto-tool-choice` + `--tool-call-parser hermes` | Required for our 5-tool agentic chat. Qwen 2.5 uses Hermes-style tool calls. |
129
+ | `--trust-remote-code` | Tokenizer ships custom code; flag is no-op for Qwen 2.5 but kept for compatibility |
130
+
131
+ **What you'll see on first run** (~70 seconds total):
132
+
133
+ ```
134
+ INFO 05-04 20:56:36 [utils.py:302] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.17.1
135
+ INFO 05-04 20:56:36 [utils.py:302] █▄█▀ █ █ █ █ model Qwen/Qwen2.5-14B-Instruct
136
+ config.json: 100%|████████████████████| 663/663 [00:00<00:00, 8.25MB/s]
137
+ model-00001-of-00008.safetensors: 100%|██████| 3.89G/3.89G [00:05<00:00, 745MB/s]
138
+ ... (8 shards, ~28 GB total in 5.9 sec)
139
+ INFO 05-04 20:57:08 [gpu_model_runner.py:4364] Model loading took 27.63 GiB memory and 17.358448 seconds
140
+ INFO 05-04 20:57:32 [gpu_worker.py:424] Available KV cache memory: 141.96 GiB
141
+ INFO 05-04 20:57:32 [kv_cache_utils.py:1314] GPU KV cache size: 775,280 tokens
142
+ INFO 05-04 20:57:32 [kv_cache_utils.py:1319] Maximum concurrency for 32,768 tokens per request: 23.66x
143
+ INFO: Application startup complete.
144
+ INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
145
+ ```
146
+
147
+ The vLLM server now serves OpenAI-compatible requests. **Don't close this SSH session** — closing it kills the server. Open a second SSH window for the smoke test.
148
+
149
+ ---
150
+
151
+ ## Step 7 — Smoke-test the endpoint
152
+
153
+ From your local machine:
154
+
155
+ ```bash
156
+ # List models
157
+ curl http://<DROPLET_IP>:8000/v1/models -H "Authorization: Bearer sk-paperhawk-2026"
158
+
159
+ # Chat completion
160
+ curl http://<DROPLET_IP>:8000/v1/chat/completions \
161
+ -H "Content-Type: application/json" \
162
+ -H "Authorization: Bearer sk-paperhawk-2026" \
163
+ -d '{"model":"Qwen/Qwen2.5-14B-Instruct","messages":[{"role":"user","content":"Hello, who are you? Answer in one sentence."}],"max_tokens":50,"temperature":0}'
164
+ ```
165
+
166
+ Expected response: `"I am Qwen, a large language model created by Alibaba Cloud."`
167
+
168
+ If you get `401 Unauthorized`, the Bearer token is wrong (must match the `--api-key` value exactly). If you get `Connection refused`, port 8000 isn't open or the vLLM server didn't start — check the SSH window from step 6.
169
+
170
+ ---
171
+
172
+ ## Step 8 — Snapshot the droplet (cost optimization)
173
+
174
+ Once everything works, take a live snapshot. It captures the entire boot disk (~96 GB including the Docker container with the cached Qwen model), so a future restart is **30 seconds** instead of a 70-second cold start.
175
+
176
+ In the AMD Cloud UI:
177
+
178
+ 1. Droplet → **Backups & Snapshots** tab → **Take a Snapshot**
179
+ 2. Name: `paperhawk-vllm-tested-YYYY-MM-DD`
180
+ 3. Click **Take Live Snapshot** (live works fine — vLLM does only read-only inference)
181
+
182
+ The snapshot takes 10–15 minutes. Storage cost: $0.06 / GB / month × ~96 GB = **~$0.32 / day**.
183
+
184
+ ---
185
+
186
+ ## Step 9 — Destroy the droplet (stop the meter)
187
+
188
+ When you're done with the dev session, **destroy** the droplet (do not just power-off — powered-off droplets still bill at $1.99/hr).
189
+
190
+ In the UI: Droplet → **Actions** ▼ → **Destroy** → type the droplet name to confirm.
191
+
192
+ **Important**: when the destroy dialog asks if you also want to destroy the snapshot, **leave it unchecked**. The snapshot survives the destroy and is what you'll use to recreate the droplet.
193
+
194
+ ---
195
+
196
+ ## Step 10 — Recreate from snapshot (Friday morning)
197
+
198
+ When you need the endpoint live again (e.g., for a demo or judging window):
199
+
200
+ 1. AMD Cloud → **Backups & Snapshots** → click `…` next to your snapshot → **Create GPU Droplet**
201
+ 2. Configuration: same MI300X / ATL1 / SSH key
202
+ 3. Wait 5–10 minutes for `Active`. Note the new public IP.
203
+
204
+ Then SSH in (with the new IP) and:
205
+
206
+ ```bash
207
+ docker start rocm
208
+ docker exec -it rocm /bin/bash
209
+ vllm serve Qwen/Qwen2.5-14B-Instruct --api-key sk-paperhawk-2026 --port 8000 --host 0.0.0.0 --enable-auto-tool-choice --tool-call-parser hermes --trust-remote-code
210
+ ```
211
+
212
+ Because the snapshot includes the cached model in the Docker container layer, **vLLM startup is ~30 seconds** instead of 70.
213
+
214
+ ---
215
+
216
+ ## Live performance numbers (measured)
217
+
218
+ From our end-to-end test on May 5, 2026:
219
+
220
+ | Metric | Value |
221
+ |---|---|
222
+ | HF Hub model download (8 safetensors, 28 GB) | 5.9 sec (700+ MB/s from ATL DC) |
223
+ | Model load to MI300X VRAM | 17.4 sec |
224
+ | CUDA graph compile (51 size-buckets) | 20.5 sec |
225
+ | **Total cold-start** | **~70 sec** |
226
+ | **Warm restart from snapshot** | **~30 sec** |
227
+ | Available KV cache (192 GB − 27.6 GB model − 22 GB headroom) | 141.96 GiB |
228
+ | KV cache token capacity | 775,280 tokens |
229
+ | Max concurrency at 32k context | 23.66× parallel requests |
230
+ | Prompt throughput (live audit demo) | 307 tokens/sec |
231
+ | Generation throughput (live audit demo) | 252 tokens/sec |
232
+ | Prefix cache hit rate (multi-agent prompts) | 30.4% |
233
+ | End-to-end audit demo (3 PDFs from HF Space) | 23.3 sec / 61.7× speedup vs manual |
234
+
235
+ ---
236
+
237
+ ## Cost breakdown (our actual hackathon spend)
238
+
239
+ | Item | Cost |
240
+ |---|---|
241
+ | Initial dev session (provisioning, vLLM setup, debugging) | ~$3 |
242
+ | Live validation session (30 minutes) | ~$1 |
243
+ | Snapshot storage (5 days from Tuesday to Friday) | ~$1.60 |
244
+ | Live judging window (estimated 24 hours) | ~$48 |
245
+ | **Total estimated** | **~$54** of the free $100 credit |
246
+
247
+ Plenty of buffer for a longer judging window or a second iteration.
248
+
249
+ ---
250
+
251
+ ## Common pitfalls
252
+
253
+ - **"Out of capacity in the selected region"**: Switch to ATL1. NYC1 frequently runs out of MI300X. Pass `?region=atl1` in the Create-Droplet URL.
254
+ - **`Permission denied (publickey)` on SSH**: Either the `~/.ssh/id_ed25519` is passphrase-protected and the agent isn't unlocked, or you have the wrong key. Use a dedicated passphrase-less key (step 1) and `-o IdentityAgent=none` on the ssh command.
255
+ - **vLLM exits with `Triton FlashAttention error` on first run**: Older vLLM 0.8.x builds had this issue. The 0.17.1 + ROCm 7.0 build we use has it fixed. If you're stuck on an older image, prefix with `VLLM_USE_TRITON_FLASH_ATTN=0`.
256
+ - **Docker container `rocm` not running after reboot**: Manual `docker start rocm`. Not auto-started by default.
257
+ - **Powered-off droplet still billing**: Power-off does **not** stop billing. Only **Destroy** does. Snapshot first if you want to keep the state.
258
+
259
+ ---
260
+
261
+ ## Cross-references
262
+
263
+ - [`docs/HUGGINGFACE_DEPLOYMENT.md`](HUGGINGFACE_DEPLOYMENT.md) — how the Streamlit Space talks to this vLLM endpoint
264
+ - [`docs/ARCHITECTURE.md`](ARCHITECTURE.md) — how the application uses the vLLM endpoint via the provider abstraction
265
+ - [`docs/AMD_DEPLOY_LESSONS_LEARNED.md`](AMD_DEPLOY_LESSONS_LEARNED.md) — extended history of every push iteration, error message, and workaround we hit
docs/ARCHITECTURE.md ADDED
@@ -0,0 +1,229 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PaperHawk Architecture
2
+
3
+ How PaperHawk is built and why each piece is where it is. This document explains the multi-graph LangGraph orchestration, the 14 deterministic domain checks, the 6-layer anti-hallucination stack, and the multi-agent DD assistant.
4
+
5
+ ---
6
+
7
+ ## High-level architecture
8
+
9
+ ```
10
+ ┌──────────────────────────────────────────────────────────────────────────┐
11
+ │ USER (Streamlit 5-tab UI) │
12
+ │ Upload │ Results │ Chat │ DD Assistant │ Report │
13
+ └────────────────────────────────┬─────────────────────────────────────────┘
14
+
15
+ ┌────────────────────┼────────────────────────┐
16
+ │ │ │
17
+ ▼ ▼ ▼
18
+ ┌──────────────────┐ ┌──────────────────┐ ┌─────────────────────────┐
19
+ │ pipeline_graph │ │ chat_graph │ │ dd_graph │
20
+ │ │ │ │ │ │
21
+ │ Ingest → │ │ Intent classify │ │ Contract filter → │
22
+ │ Classify → │ │ → Plan → │ │ Per-contract summary → │
23
+ │ Extract → │ │ Agent (5 tools) │ │ Multi-agent specialists │
24
+ │ Compare → │ │ → Synthesizer → │ │ (audit/legal/compliance │
25
+ │ Risk → │ │ Validator │ │ /financial) → │
26
+ │ Report │ │ ([Source: …]) │ │ Supervisor → Synthesizer│
27
+ └──────────────────┘ └──────────────────┘ └─────────────────────────┘
28
+ │ │
29
+ └─────────────┬──────────────────────────┘
30
+
31
+ ┌──────────────────────────┐
32
+ │ package_insights_graph │
33
+ │ │
34
+ │ Cross-document analysis │
35
+ │ (price-drift, dupes, │
36
+ │ three-way matching) │
37
+ └──────────────────────────┘
38
+
39
+
40
+ ┌──────────────────────────┐
41
+ │ Provider abstraction │
42
+ │ (configurable_alternatives)
43
+ │ │
44
+ │ vLLM ←→ Ollama ←→ Dummy │
45
+ └──────────────────────────┘
46
+
47
+
48
+ ┌──────────────────────────┐
49
+ │ AMD MI300X (vLLM) │
50
+ │ Qwen 2.5 14B Instruct │
51
+ │ 192 GB HBM3, ROCm 7.0 │
52
+ └──────────────────────────┘
53
+ ```
54
+
55
+ ---
56
+
57
+ ## Compiled graphs (4)
58
+
59
+ Every entry-point in the system is a separately compiled LangGraph artifact with its own typed state and `AsyncSqliteSaver` checkpointer:
60
+
61
+ ### 1. `pipeline_graph` — the document processing pipeline
62
+
63
+ The 6-step end-to-end flow when the user uploads a package:
64
+
65
+ 1. **Ingest** — PDF (PyMuPDF + pdfplumber for table extraction), DOCX (native), images (vision-first via the LLM), with Tesseract OCR fallback for scanned PDFs (EN/HU/DE)
66
+ 2. **Classify** — 6-way doc-type classifier with structured output (`invoice`, `delivery_note`, `purchase_order`, `contract`, `financial_report`, `other`); ISA 500 evidence-quality score
67
+ 3. **Extract** — per doc-type Pydantic v2 schema with `_quotes` and `_confidence` fields; universal fallback schema for unknown types
68
+ 4. **Compare** — three-way matching subgraph (invoice + delivery note + PO), duplicate-invoice detection (ISA 240)
69
+ 5. **Risk** — basic plausibility + 14 domain checks (Send-API parallel fan-out) + LLM risk ensemble + 3-stage filter chain
70
+ 6. **Report** — DOCX export, JSON output, Streamlit UI rendering
71
+
72
+ State: `PipelineState` (Pydantic), with reducers for risk lists and per-document results.
73
+
74
+ ### 2. `chat_graph` — the agentic chat
75
+
76
+ 5-tool ReAct agent with strict citation enforcement:
77
+
78
+ - **Tools**: `list_documents`, `get_extraction`, `search_documents` (hybrid Chroma + BM25 with Reciprocal Rank Fusion), `compare_documents`, `validate_document`
79
+ - **Prompt**: 17-rule system prompt enforcing `[Source: filename.pdf]` format
80
+ - **Validator node**: post-processor that drops any answer without citations
81
+ - **Intent classifier**: routes to direct-answer vs tool-use paths to keep latency low for casual queries
82
+
83
+ State: `ChatState` with message history, retrieved chunks, and citation list.
84
+
85
+ ### 3. `dd_graph` — the multi-agent DD assistant
86
+
87
+ For M&A due-diligence packages:
88
+
89
+ - **Contract filter** — selects only contract-type documents from the package
90
+ - **Per-contract summary** — extracts each contract's key terms (parties, term, value, change-of-control, non-compete, auto-renewal)
91
+ - **4 specialist agents** (run in parallel via Send-API):
92
+ - `audit_specialist` — material misstatement risk, ISA 240 fraud indicators
93
+ - `legal_specialist` — change-of-control, non-compete, automatic-renewal red flags
94
+ - `compliance_specialist` — GDPR Art. 28 sub-processor language, AML counterparty checks
95
+ - `financial_specialist` — Ptk. 6:98 disproportionate penalty clauses, materiality thresholds
96
+ - **Supervisor** — coordinates specialists, drops business-normal noise
97
+ - **Synthesizer** — writes 3-paragraph executive summary
98
+
99
+ State: `DDState` with contract list, per-contract summaries, specialist findings, executive summary.
100
+
101
+ ### 4. `package_insights_graph` — cross-document analysis
102
+
103
+ Package-level analyzers that don't fit into the per-document pipeline:
104
+
105
+ - **Pricing-drift detector** — flags > 30% price changes for the same line item across invoices in a package (caught the 57.5% drift in our live demo)
106
+ - **Duplicate-invoice detector** — exact + near-match (date within 13 days, amount within 1%)
107
+ - **Counterparty consistency** — same supplier name spelled differently across documents
108
+
109
+ State: `PackageState` with per-document extractions and aggregated findings.
110
+
111
+ ---
112
+
113
+ ## Subgraphs (6)
114
+
115
+ Reusable LangGraph subgraphs imported by the main graphs:
116
+
117
+ | Subgraph | Purpose |
118
+ |---|---|
119
+ | `extract_subgraph` | Per-document extraction with quote validator |
120
+ | `ingest_subgraph` | PDF/DOCX/image loading with OCR fallback |
121
+ | `llm_risk_subgraph` | LLM risk generation with structured output |
122
+ | `rag_index_subgraph` | Chunking, embedding, ChromaDB indexing |
123
+ | `rag_query_subgraph` | Hybrid Chroma + BM25 retrieval with RRF |
124
+ | `risk_subgraph` | Domain check fan-out + LLM risk + 3-stage filters |
125
+
126
+ ---
127
+
128
+ ## 14 deterministic domain checks
129
+
130
+ The check registry (`domain_checks/__init__.py`) is the heart of PaperHawk's auditor-grade output. Every check is a Python `Protocol` implementation, not an LLM prompt — they cannot hallucinate, can be unit-tested, and produce defensible findings with explicit regulation sources.
131
+
132
+ ### A-tier (essential)
133
+
134
+ 1. **Mandatory invoice elements** (HU VAT Act §169) — 18 required elements per invoice
135
+ 2. **Tax-ID checksum** (Art. 22 §) — mod-11 Hungarian tax-ID validation
136
+ 3. **Contract completeness** (Ptk. Book 6) — termination, governing law, penalty, confidentiality clauses
137
+ 4. **Disproportionality** (Ptk. 6:98) — penalty clause > 31.7% of contract value flagged HIGH
138
+ 5. **Rounded amounts** (ISA 240) — > 14.7% rounded amounts flagged suspicious, > 24.3% flagged HIGH
139
+ 6. **Evidence hierarchy** (ISA 500) — document-type reliability score (8/10 invoice, 7/10 contract)
140
+
141
+ ### B-tier (supplementary)
142
+
143
+ 7. **Materiality** (ISA 320) — 1.93% of document value as info-level threshold
144
+ 8. **GDPR Article 28** — 10 mandatory sub-processor language elements + PII detection
145
+ 9. **DD red flags** (M&A) — change-of-control, non-compete, automatic-renewal triggers
146
+
147
+ ### C-tier (informational)
148
+
149
+ 10. **Incoterms 2020** — 11 incoterm rules detected via regex word-boundaries
150
+ 11. **IFRS/HAR anomaly** — goodwill amortization flag, operational lease in IFRS context
151
+ 12. **Duplicate invoice** (ISA 240) — exact + near-match with 13-day date filter
152
+ 13. **AML sanctions** (Pmt.) — static EU/OFAC snapshot with fuzzy name match
153
+ 14. **Contract dates** — start-end consistency, expiry detection
154
+
155
+ **Jurisdiction-aware**: Hungarian-specific rules (HU VAT Act, Ptk., Art.) apply only to Hungarian documents. Universal rules (ISA, GDPR, Incoterms, AML) apply everywhere.
156
+
157
+ ---
158
+
159
+ ## 6-layer anti-hallucination stack
160
+
161
+ The system is designed so the LLM **cannot** lie about a document and have the lie pass through.
162
+
163
+ | Layer | What it does |
164
+ |---|---|
165
+ | 1. `temperature=0` | Deterministic outputs every run |
166
+ | 2. Source quote requirement | Every extraction must include a verbatim quote from the source PDF in `_quotes` |
167
+ | 3. Confidence scoring | high / medium / low per extracted field, surfaced to the user |
168
+ | 4. Plausibility validators | Deterministic Python checks for math, dates, totals, item-level VAT, currency normalization |
169
+ | 5. 3-stage LLM-risk filter chain | Drops business-normal noise, drops repeats of basic deterministic checks, drops contradictions |
170
+ | 6. Quote validator | Text-search the source PDF for the claimed quote; downgrade confidence if not found verbatim, drop entirely if obviously fabricated |
171
+
172
+ In our live audit demo, layer 6 caught **4 of 6** hallucinated citations from Qwen 2.5 14B and downgraded them to `low` confidence.
173
+
174
+ The `validation/` package is one of the most-edited folders in the repo precisely because we treat anti-hallucination as a first-class concern, not a guardrail layer slapped on top.
175
+
176
+ ---
177
+
178
+ ## Provider abstraction
179
+
180
+ `configurable_alternatives` lets us swap LLM backends with a single env var:
181
+
182
+ | `LLM_PROFILE` | Backend | Use case |
183
+ |---|---|---|
184
+ | `vllm` | vLLM REST endpoint (OpenAI-compatible) | Production on AMD MI300X |
185
+ | `ollama` | Local Ollama at `localhost:11434` | Dev on consumer GPU |
186
+ | `dummy` | Deterministic stub | CI tests, smoke tests, judge quick-demo |
187
+
188
+ The application code never imports an LLM SDK directly — all calls go through `providers/` factory functions with `configurable_alternatives`. Switching from Anthropic Claude (our original dev target) to Qwen on vLLM required **zero application code changes** — only env vars.
189
+
190
+ ---
191
+
192
+ ## Embedding + retrieval
193
+
194
+ - **Model**: BAAI/bge-m3 (1024-dim, multilingual EN/HU/DE/FR via sentence-transformers)
195
+ - **Storage**: ChromaDB persistent (per-session) + BM25 in-memory keyword index
196
+ - **Hybrid retrieval**: Reciprocal Rank Fusion of Chroma top-K and BM25 top-K
197
+ - **Chunking**: Natural-boundary chunking (paragraph-aware, ~500 tokens with overlap)
198
+
199
+ The embedding model loads once at app startup (~2.3 GB to RAM/VRAM). On first run it downloads from Hugging Face Hub to `~/.cache/huggingface/`.
200
+
201
+ ---
202
+
203
+ ## State persistence
204
+
205
+ - **Per-session**: Streamlit `session_state` for UI state (uploaded files, current package)
206
+ - **Per-graph**: `AsyncSqliteSaver` checkpointer at `data/checkpoints.sqlite` for LangGraph state
207
+ - **Vector store**: ChromaDB at `chroma_db/` (gitignored)
208
+
209
+ Restarting the app loads the last checkpoint, so chat history and extraction results survive a restart.
210
+
211
+ ---
212
+
213
+ ## Streamlit UI (5 tabs)
214
+
215
+ 1. **Upload** — drag-and-drop (PDF, DOCX, PNG, JPG, TXT), 200 MB per file, plus 3 pre-bundled demo packages
216
+ 2. **Results** — classification confidence, extracted data, risks per document, package-level cross-doc analysis
217
+ 3. **Chat** — agentic chat with `[Source: filename.pdf]` citations
218
+ 4. **DD Assistant** — for M&A packages: per-contract summaries + 4 specialist findings + executive summary + downloadable DOCX
219
+ 5. **Report** — JSON output + DOCX export
220
+
221
+ The async runtime uses a long-lived background event loop (`app/async_runtime.py`) so the UI stays responsive during multi-minute pipeline runs.
222
+
223
+ ---
224
+
225
+ ## Cross-references
226
+
227
+ - [`docs/AMD_DEPLOYMENT.md`](AMD_DEPLOYMENT.md) — how the production vLLM endpoint runs on AMD MI300X
228
+ - [`docs/HUGGINGFACE_DEPLOYMENT.md`](HUGGINGFACE_DEPLOYMENT.md) — how the Streamlit app deploys as a public HF Space
229
+ - [`docs/SUBMISSION.md`](SUBMISSION.md) — full hackathon submission brief with TAM/SAM, competitor positioning, live deployment validation
docs/HF_SPACE_DEFAULT_GETTING_STARTED.md DELETED
@@ -1,193 +0,0 @@
1
- # HF Space Default Getting Started — Snapshot 2026-05-05
2
-
3
- A `lablab-ai-amd-developer-hackathon/paperhawk` Space létrehozása után a HF Spaces egy default "Get Started" útmutatót mutat. Ezt mentjük el itt referenciaként, mert a default Dockerfile-mintája hasznos referencia a paperhawk Dockerfile átírásához (port 8501 → 7860, user-setup pattern).
4
-
5
- **Forrás**: a Space oldal alján, a default-README után jelent meg.
6
-
7
- **URL**: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk
8
-
9
- **Kontextus**: a Space frissen létrehozva, Docker SDK + Blank template + `Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X` short description.
10
-
11
- ---
12
-
13
- ## Get started with your Docker Space!
14
-
15
- Your space has been created, follow these steps to get started (or read the full [documentation](https://huggingface.co/docs/hub/spaces-sdks-docker))
16
-
17
- ### Start by cloning this repo by using:
18
-
19
- **HTTPS:**
20
-
21
- ```bash
22
- git clone https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk
23
- ```
24
-
25
- **SSH:**
26
-
27
- ```bash
28
- git clone git@hf.co:spaces/lablab-ai-amd-developer-hackathon/paperhawk
29
- ```
30
-
31
- ### Make sure you're CLI v2.x.x or above:
32
-
33
- ```bash
34
- curl -LsSf https://hf.co/cli/install.sh | sh
35
- ```
36
-
37
- ### Download the Space:
38
-
39
- ```bash
40
- hf download lablab-ai-amd-developer-hackathon/paperhawk --repo-type=space
41
- ```
42
-
43
- ---
44
-
45
- ## Let's create a simple Python app using FastAPI
46
-
47
- ### `requirements.txt`
48
-
49
- ```
50
- fastapi
51
- uvicorn[standard]
52
- ```
53
-
54
- > **Hint:** You can also create the requirements file directly in your browser.
55
-
56
- ### `app.py`
57
-
58
- ```python
59
- from fastapi import FastAPI
60
-
61
- app = FastAPI()
62
-
63
- @app.get("/")
64
- def greet_json():
65
- return {"Hello": "World!"}
66
- ```
67
-
68
- > **Hint:** You can also create the app file directly in your browser.
69
-
70
- ---
71
-
72
- ## Create your Dockerfile
73
-
74
- ```dockerfile
75
- # Read the doc: https://huggingface.co/docs/hub/spaces-sdks-docker
76
- # you will also find guides on how best to write your Dockerfile
77
-
78
- FROM python:3.9
79
-
80
- RUN useradd -m -u 1000 user
81
- USER user
82
- ENV PATH="/home/user/.local/bin:$PATH"
83
-
84
- WORKDIR /app
85
-
86
- COPY --chown=user ./requirements.txt requirements.txt
87
- RUN pip install --no-cache-dir --upgrade -r requirements.txt
88
-
89
- COPY --chown=user . /app
90
- CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
91
- ```
92
-
93
- > **Hint:** Alternatively, you can create the Dockerfile file directly in your browser.
94
-
95
- ---
96
-
97
- ## Then commit and push
98
-
99
- ```bash
100
- git add requirements.txt app.py Dockerfile
101
- git commit -m "Add application file"
102
- git push
103
- ```
104
-
105
- > Finally, your Space should be running on this page after a few moments!
106
-
107
- ---
108
-
109
- ## App port
110
-
111
- > Your Docker Space needs to listen on port `7860`.
112
-
113
- ## Personalize your Space
114
-
115
- Make your Space stand out by customizing its emoji, colors, and description by **editing metadata** in its `README.md` file.
116
-
117
- ## Documentation
118
-
119
- Read the full documentation for Docker Spaces [here](https://huggingface.co/docs/hub/spaces-sdks-docker).
120
-
121
- ---
122
-
123
- ## Mit jelent ez nekünk (paperhawk-specifikus megjegyzések)
124
-
125
- ### A default Dockerfile vs a paperhawk Dockerfile
126
-
127
- A paperhawk meglévő Dockerfile-ja **fejlettebb** mint a default-példa:
128
-
129
- | Aspektus | HF default | Paperhawk |
130
- |---|---|---|
131
- | Python version | `python:3.9` | `python:3.12-slim` (modernebb) |
132
- | User setup | `useradd -m -u 1000 user` + `USER user` (non-root, security best-practice) | NINCS (root user) |
133
- | OS-deps | nincs | `tesseract-ocr` + `poppler-utils` + `libmupdf-dev` (PDF + OCR) |
134
- | Pre-download | nincs | `BAAI/bge-m3` 2.27 GB (build-time) |
135
- | App | `uvicorn` FastAPI | `streamlit` |
136
- | Port | **`7860`** | **`8501`** → **átírva 7860-ra a HF Space-nek** (2026-05-05) |
137
-
138
- ### A 2 fő átírás amit a paperhawk Dockerfile-on csinálni kellett
139
-
140
- 1. **Port-átállítás 8501 → 7860** (kész, 2026-05-05):
141
- - `EXPOSE 8501` → `EXPOSE 7860`
142
- - `--server.port=8501` → `--server.port=7860`
143
- - `HEALTHCHECK ... http://localhost:8501/_stcore/health` → `http://localhost:7860/_stcore/health`
144
-
145
- 2. **(opcionális) User-setup hozzáadása** security best-practice szempontból:
146
- - `RUN useradd -m -u 1000 user`
147
- - `USER user`
148
- - `ENV PATH="/home/user/.local/bin:$PATH"`
149
- - `COPY --chown=user ...`
150
- - **A HF Spaces NEM követeli kötelező módon**, és a paperhawk-stack root-ként is jól fut.
151
-
152
- ### A README.md front-matter
153
-
154
- A HF Spaces megköveteli a `README.md` tetején egy YAML front-matter-t. A paperhawk `README.md` tetejére beillesztve (2026-05-05):
155
-
156
- ```yaml
157
- ---
158
- title: PaperHawk
159
- emoji: 🦅
160
- colorFrom: red
161
- colorTo: orange
162
- sdk: docker
163
- pinned: false
164
- license: mit
165
- short_description: Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X
166
- ---
167
- ```
168
-
169
- A meglévő paperhawk `README.md`-tartalom (project README) ezután következik. A front-matter csak a HF Space-nek szól, GitHub-on is renderelhető (a YAML-t code-block-ként mutatja).
170
-
171
- ### A clone + push workflow a paperhawk-on
172
-
173
- A meglévő paperhawk GitHub-repón (`nandorfivince/paperhawk`) hozzáadunk egy új remote-ot:
174
-
175
- ```bash
176
- cd ~/development/<host-paperhawk-path>
177
- git remote add space https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk
178
- git push space main
179
- ```
180
-
181
- A push első futáskor authenticálni kér — a HF Hub-token-t kéri, amit a Vincsipe accountból lehet generálni a https://huggingface.co/settings/tokens-en (új Token, "Write" scope).
182
-
183
- ### App port környezeti változó
184
-
185
- A HF Spaces a `7860`-as portot várja default. A paperhawk `streamlit` parancs ki van egészítve a `--server.port=7860` flag-gel a `Dockerfile`-ben (2026-05-05).
186
-
187
- ### HF Spaces hardware
188
-
189
- CPU Basic = free tier, 16 GB RAM, 2 vCPU. Bőven elég a paperhawk-Streamlit-jéhez (~3-5 GB RAM-fogyasztás bge-m3 + ChromaDB + Streamlit). A vLLM az AMD MI300X-en fut **külön**, a Space `VLLM_BASE_URL` Secret-en keresztül hivatkozik rá.
190
-
191
- ### Sleep mode
192
-
193
- A free Space 48 órás inaktivitás után alvó-módba kerül. Az első request a felébredés után 30-60 sec. A bíráskodás alatt érdemes **periodikusan** pingelni a Space-t (pl. UptimeRobot 30 perces intervallum), vagy a Build-in-Public posztokon megosztani hogy organic-traffic-al ébren tartsuk.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/HUGGINGFACE_DEPLOYMENT.md ADDED
@@ -0,0 +1,251 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hugging Face Spaces Deployment
2
+
3
+ How we deployed the PaperHawk Streamlit application as a public Hugging Face Space, with the AMD MI300X vLLM endpoint as its inference backend.
4
+
5
+ ---
6
+
7
+ ## What you get
8
+
9
+ - **Public Space URL** — a Streamlit app anyone can use in a browser, no signup
10
+ - **Free CPU Basic tier** — 16 GB RAM, 2 vCPU. The app runs here; the LLM runs on AMD MI300X via vLLM (separate Cloud).
11
+ - **Two paths**: under the `lablab-ai-amd-developer-hackathon` org (Plan A — qualifies for HF Special Prize), or under your personal account (Plan B — fallback if the org has hardware-quota issues)
12
+
13
+ Live example: <https://huggingface.co/spaces/Vincsipe/paperhawk>
14
+
15
+ ---
16
+
17
+ ## Prerequisites
18
+
19
+ 1. Hugging Face account (free)
20
+ 2. **Optional**: membership in the `lablab-ai-amd-developer-hackathon` org if submitting to the AMD Developer Hackathon (Plan A). The HF Special Prize requires the Space to live under this org.
21
+ 3. A running vLLM endpoint on AMD MI300X — see [`AMD_DEPLOYMENT.md`](AMD_DEPLOYMENT.md)
22
+ 4. The PaperHawk repo cloned locally with `Dockerfile`, `README.md`, and `app/main.py`
23
+
24
+ ---
25
+
26
+ ## Step 1 — Create the Space
27
+
28
+ Go to <https://huggingface.co/new-space> (or, if you're an org member, click `+ New` → `New Space` from the org page).
29
+
30
+ **Configuration**:
31
+
32
+ | Field | Value |
33
+ |---|---|
34
+ | Owner | `lablab-ai-amd-developer-hackathon` (Plan A) or your personal handle (Plan B) |
35
+ | Space name | `paperhawk` |
36
+ | Short description | `Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X` |
37
+ | License | `mit` |
38
+ | **Space SDK** | **Docker** (not Streamlit, not Gradio — see step 2) |
39
+ | **Template** | **Blank** (we ship our own Dockerfile) |
40
+ | Hardware | CPU Basic (free, 16 GB RAM) |
41
+ | Visibility | Public (required for the HF Special Prize) |
42
+
43
+ Click **Create Space**. You'll get an empty repo at:
44
+
45
+ ```
46
+ https://huggingface.co/spaces/<owner>/paperhawk
47
+ ```
48
+
49
+ **Why Docker SDK and not Streamlit-template?** As of 2026, the HF Spaces "Streamlit" SDK lives under the Docker tab as a managed template. We bypass the template because PaperHawk needs custom OS dependencies (Tesseract OCR for EN/HU/DE, poppler-utils for table extraction, libmupdf for PDFs) that the templated builder doesn't include. Our own Dockerfile is faster to debug and gives us a deterministic base image.
50
+
51
+ ---
52
+
53
+ ## Step 2 — Configure the Dockerfile for HF Spaces
54
+
55
+ The PaperHawk Dockerfile is HF-Spaces-ready out of the box, with one critical detail: **port 7860**.
56
+
57
+ ```dockerfile
58
+ # syntax=docker/dockerfile:1.6
59
+ FROM python:3.12-slim AS base
60
+
61
+ ENV PYTHONUNBUFFERED=1 PYTHONDONTWRITEBYTECODE=1
62
+
63
+ # OS deps
64
+ RUN apt-get update && apt-get install -y --no-install-recommends \
65
+ tesseract-ocr tesseract-ocr-eng tesseract-ocr-hun tesseract-ocr-deu \
66
+ poppler-utils libmupdf-dev curl \
67
+ && rm -rf /var/lib/apt/lists/*
68
+
69
+ WORKDIR /app
70
+
71
+ COPY requirements.txt .
72
+ RUN pip install --upgrade pip \
73
+ && pip install --index-url https://download.pytorch.org/whl/cpu torch \
74
+ && pip install -r requirements.txt
75
+
76
+ # Pre-download the embedding model so the first user request isn't slow
77
+ RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-m3')"
78
+
79
+ COPY . .
80
+
81
+ # HF Spaces expects port 7860 (NOT Streamlit's default 8501)
82
+ EXPOSE 7860
83
+ CMD ["streamlit", "run", "app/main.py", \
84
+ "--server.address=0.0.0.0", \
85
+ "--server.port=7860", \
86
+ "--server.headless=true"]
87
+ ```
88
+
89
+ **Why 7860?** HF Spaces' Docker hosting only routes traffic to port 7860 — the Streamlit default 8501 is invisible to the public URL. This is a one-line fix that's easy to miss.
90
+
91
+ ---
92
+
93
+ ## Step 3 — Configure the README YAML front-matter
94
+
95
+ HF Spaces reads the YAML block at the top of `README.md` to configure the Space card and build behavior. PaperHawk's:
96
+
97
+ ```yaml
98
+ ---
99
+ title: PaperHawk
100
+ emoji: 🦅
101
+ colorFrom: red
102
+ colorTo: yellow
103
+ sdk: docker
104
+ pinned: false
105
+ license: mit
106
+ short_description: Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X
107
+ ---
108
+ ```
109
+
110
+ **Critical**: `colorTo` must be one of `[red, yellow, green, blue, indigo, purple, pink, gray]`. We initially used `orange` (because the AMD brand color is orange) — HF rejected the YAML as invalid, and the Space card fell back to a generic theme **with the YAML rendered as a Markdown table at the top of the page**. Fixed by changing to `yellow`.
111
+
112
+ If the Space's main page shows a `title | PaperHawk` table at the top, the YAML is invalid and HF can't parse it — check the `colorTo` value first.
113
+
114
+ ---
115
+
116
+ ## Step 4 — Set up Git LFS for binary assets
117
+
118
+ HF Spaces has a strict rule: every binary file (`*.png`, `*.pdf`, `*.pptx`, `*.docx`, `*.jpg`, `*.mp4`) must live in **Xet storage** via Git LFS, not as a regular Git blob. The cover PNG, the slide PDF, the demo packages — all of these get rejected without LFS.
119
+
120
+ On your local machine:
121
+
122
+ ```bash
123
+ # One-time, in any repo with binary files
124
+ sudo apt install git-lfs # or `brew install git-lfs` on macOS
125
+ git lfs install
126
+ ```
127
+
128
+ In the PaperHawk repo:
129
+
130
+ ```bash
131
+ git lfs track "*.png" "*.pdf" "*.pptx" "*.docx" "*.jpeg" "*.jpg" "*.mp4"
132
+ git add .gitattributes
133
+ git commit -m "Track binary files via LFS"
134
+ ```
135
+
136
+ **Important**: `git lfs track` only updates `.gitattributes`. Existing commits with binaries-as-Git-blob are still rejected by HF. Migrate the entire history:
137
+
138
+ ```bash
139
+ git lfs migrate import --include="*.png,*.pdf,*.pptx,*.docx,*.jpeg,*.jpg,*.mp4"
140
+ ```
141
+
142
+ This rewrites the HEAD commit so the binaries are LFS-blobs. New `git push` will upload them via Xet.
143
+
144
+ **Files over 10 MB**: HF Spaces also enforces a 10 MB hard limit per file even via LFS for the free Spaces tier. Any single video over 10 MB will be rejected. If you have demo videos, keep them as separate uploads on YouTube/Vimeo and link from the Space description.
145
+
146
+ ---
147
+
148
+ ## Step 5 — Add the Space as a git remote and push
149
+
150
+ ```bash
151
+ # Add a remote for the Space (token embedded in URL avoids dual auth-prompts)
152
+ HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxx # generate at https://huggingface.co/settings/tokens (Write scope, fine-grained, with org access if Plan A)
153
+ git remote add space https://<your-hf-username>:${HF_TOKEN}@huggingface.co/spaces/<owner>/paperhawk
154
+
155
+ # Push to the Space
156
+ git push --force space main
157
+ ```
158
+
159
+ **Why token in URL?** Git LFS uses a separate authentication channel from the regular Git push. Without the token in the URL, Git prompts for credentials twice and one of them silently times out. Putting the token in the URL handles both.
160
+
161
+ The first push uploads ~9 MB of LFS objects (the cover image, slide PDF, sample PDFs, sample DOCX). Subsequent pushes are fast (cached on HF's side).
162
+
163
+ ---
164
+
165
+ ## Step 6 — Add Space secrets
166
+
167
+ The app reads its LLM provider config from environment variables. In the Space:
168
+
169
+ **Settings** (top-right, on the Space page) → **Variables and secrets** → **+ New variable** for each:
170
+
171
+ | Key | Value | Type |
172
+ |---|---|---|
173
+ | `LLM_PROFILE` | `vllm` | Variable |
174
+ | `VLLM_BASE_URL` | `http://<MI300X_DROPLET_IP>:8000/v1` | Variable |
175
+ | `VLLM_MODEL` | `Qwen/Qwen2.5-14B-Instruct` | Variable |
176
+ | `EMBEDDING_MODEL` | `BAAI/bge-m3` | Variable |
177
+ | `VLLM_API_KEY` | `sk-paperhawk-2026` (the same token you passed to vLLM `--api-key`) | **Secret** |
178
+
179
+ The `VLLM_API_KEY` must be a **Secret**, not a Variable — Secrets are masked in the UI and not exposed via the public Space metadata.
180
+
181
+ After saving, the Space rebuilds automatically (~5 minutes for first build, faster for subsequent).
182
+
183
+ ---
184
+
185
+ ## Step 7 — Wait for the build, then verify
186
+
187
+ The first build pulls and installs everything — Python 3.12-slim, OS deps, PyTorch CPU wheel, the BAAI/bge-m3 model (~2.3 GB pre-download), and the rest of `requirements.txt`. Expect 8–15 minutes for the cold build.
188
+
189
+ Watch the build logs in the Space → **Logs** tab. When you see `streamlit run app/main.py` and `You can now view your Streamlit app in your browser` the Space is up.
190
+
191
+ Open the Space URL in a browser and click **Audit Demo**. If the vLLM endpoint is reachable, you'll see results in 20–25 seconds.
192
+
193
+ If you get an error like `Connection refused` or a long hang, check:
194
+
195
+ 1. The MI300X droplet is running and `vllm serve` is up (SSH in, look at the SSH window from `AMD_DEPLOYMENT.md` step 6)
196
+ 2. The droplet's UFW has port 8000 open (`ufw status | grep 8000` from the droplet)
197
+ 3. The `VLLM_BASE_URL` in Space Secrets matches the droplet's current public IP (which changes on every recreate-from-snapshot)
198
+
199
+ ---
200
+
201
+ ## Step 8 — Hide the YAML from the GitHub display (optional)
202
+
203
+ The YAML front-matter is needed for HF Spaces but **looks ugly on GitHub** — the renderer shows it as a `key | value` table at the top of the README, with no formatting.
204
+
205
+ Workaround: GitHub honors `.github/README.md` over the root `README.md` for the public repo display. We commit a copy of the README **without** the YAML block as `.github/README.md`:
206
+
207
+ ```bash
208
+ mkdir -p .github
209
+ tail -n +12 README.md > .github/README.md # skip the first 11 lines (the YAML + blank line)
210
+ # (optionally edit .github/README.md to use absolute raw-image URLs for paperhawk.jpeg)
211
+ git add .github/README.md
212
+ git commit -m "Add .github/README.md to hide HF YAML on GitHub display"
213
+ git push origin main
214
+ ```
215
+
216
+ Now GitHub shows `.github/README.md` (clean), and HF Spaces still reads the root `README.md` (with YAML). One file, two faces.
217
+
218
+ ---
219
+
220
+ ## Plan A vs Plan B
221
+
222
+ | Aspect | Plan A (org Space) | Plan B (personal Space) |
223
+ |---|---|---|
224
+ | Owner | `lablab-ai-amd-developer-hackathon/paperhawk` | `<your-handle>/paperhawk` |
225
+ | HF Special Prize | ✅ Qualifies | ❌ Disqualifies |
226
+ | Org-quota dependency | ⚠️ Yes (shared with other org Spaces) | ❌ Independent |
227
+ | Visibility | Public, on the org page | Public, on your profile |
228
+ | Setup steps | Same as above | Same as above |
229
+
230
+ If the org-quota is exhausted (we hit `null quota limit` 403 errors), the same code, same Dockerfile, same YAML, same env-var setup pushes to a personal Space and runs immediately. This was our Plan B safety net during the hackathon.
231
+
232
+ ---
233
+
234
+ ## Common pitfalls
235
+
236
+ - **"Build failed: app port 7860 not reachable"**: Your Dockerfile is binding to a different port (probably Streamlit's default 8501). Change `EXPOSE` and `CMD` to use 7860.
237
+ - **YAML rendered as a Markdown table on the Space main page**: The YAML is invalid. Most likely culprits: invalid `colorTo` (allowed: red/yellow/green/blue/indigo/purple/pink/gray, **not** orange), invalid `sdk`, missing `---` opening line, BOM/whitespace before the first `---`.
238
+ - **"binary files require Xet"**: You haven't run `git lfs track` + `git lfs migrate import` yet. The HF push rejects committed binaries that aren't LFS-blobs.
239
+ - **"Files larger than 10 MiB are not allowed"**: A single file is over 10 MB even after LFS. Move it out of the repo and link from the README.
240
+ - **"null quota limit" 403 error**: Org-level hardware quota is exhausted. Wait for capacity, ping a lablab admin in Discord, or push to a personal Space (Plan B).
241
+ - **App loads but "Connection refused" on Audit Demo**: The vLLM endpoint is down or the IP changed. SSH into the droplet and confirm `vllm serve` is running. Update `VLLM_BASE_URL` Secret if the IP rotated.
242
+ - **App loads but "401 Unauthorized" on every LLM call**: The `VLLM_API_KEY` Secret doesn't match the `--api-key` you passed to `vllm serve`. They have to be byte-for-byte identical.
243
+
244
+ ---
245
+
246
+ ## Cross-references
247
+
248
+ - [`docs/AMD_DEPLOYMENT.md`](AMD_DEPLOYMENT.md) — provisioning the AMD MI300X vLLM endpoint that this Space depends on
249
+ - [`docs/ARCHITECTURE.md`](ARCHITECTURE.md) — how the Streamlit app, the LangGraph multi-graph orchestrator, and the vLLM endpoint fit together
250
+ - [`docs/HF_SPACE_DEFAULT_GETTING_STARTED.md`](HF_SPACE_DEFAULT_GETTING_STARTED.md) — the canonical HF Spaces Quick Start that this guide builds on
251
+ - [`docs/SUBMISSION.md`](SUBMISSION.md) — full hackathon submission brief
docs/SUBMISSION.md CHANGED
@@ -19,7 +19,39 @@
19
 
20
  ---
21
 
22
- ## Long Description
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ### The Problem
25
 
@@ -147,23 +179,94 @@ One codebase, one MIT license, three prize pools.
147
  | Project Title | DONE | `PaperHawk` |
148
  | Short Description | DONE | 247 characters, A+C blend |
149
  | Long Description | DONE | 10 sections, builder-energy tone |
150
- | Cover Image | DONE | `paperhawk.jpeg` (2048 × 819 px) |
 
151
  | Technology & Category Tags | DONE | 12 tags |
152
  | Public GitHub Repository | DONE | `github.com/nandorfivince/paperhawk` |
153
- | Video Presentation | TODO | Demo walkthrough video |
154
- | Slide Presentation | TODO | 5–8 slide deck |
155
- | Demo Application URL | TODO | HF Space public URL |
156
- | HF Space URL | TODO | Under `lablab-ai-amd-developer-hackathon` org |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
  ---
159
 
160
  ## Submission URLs (filled at submission time)
161
 
 
 
162
  - **GitHub repo**: https://github.com/nandorfivince/paperhawk
163
- - **Hugging Face Space**: *(to be added)*
164
- - **Demo video**: *(to be added)*
165
- - **Slide deck**: *(to be added)*
166
- - **Live application URL**: *(same as HF Space URL)*
 
 
 
 
 
 
 
 
 
 
167
 
168
  ---
169
 
 
19
 
20
  ---
21
 
22
+ ## Long Description (Submission Form — 600-2000 char limit, copy-paste-ready)
23
+
24
+ > **Use this version when filing the lablab.ai Submission Form Long Description field.** Compact, all key points covered (problem, solution, target audience, USP, performance, market, future), exactly within the 600-2000 character envelope. Char count: **~1880**.
25
+
26
+ ```
27
+ The Problem
28
+ Audit, legal due diligence, tax compliance, and M&A rely on humans reading dozens of documents looking for errors and red flags. A senior auditor needs ~8 hours per 50-page package. ChatGPT/Copilot/Harvey handle one document at a time, hallucinate citations, and lack jurisdiction-specific compliance knowledge.
29
+
30
+ Our Solution: PaperHawk
31
+ PaperHawk is an agentic multi-document intelligence platform processing 3-50 PDFs simultaneously, detecting cross-document inconsistencies humans miss. It combines:
32
+ - 14 deterministic statutory rules (HU VAT Act §169, ISA 240/320/500, GDPR Art. 28, AML, Ptk. 6:98, Art. 22) hand-coded in Python
33
+ - 6-layer anti-hallucination stack (temperature=0, source quotes, confidence scores, plausibility, LLM-risk filters, quote validator)
34
+ - Multi-agent LangGraph orchestration (4 graphs + 6 subgraphs, 5-tool agentic chat)
35
+ - Cross-document red flag detection (e.g. 57.5% price drift across 3 invoices auto-detected)
36
+
37
+ Target Audience
38
+ Auditors, lawyers, tax advisors, DD analysts, compliance officers, CFOs, forensic accountants, banking risk teams. EU + Hungarian focus initially.
39
+
40
+ Why We Win (vs Harvey, ChatPwC, OWL, Copilot)
41
+ These tools handle ONE document well. We handle MANY together — three-way matching, cross-doc consistency, package-level red flags. Plus jurisdiction-specific compliance rules hard-coded, not prompt-engineered. Open-source MIT, self-hostable on AMD MI300X.
42
+
43
+ Performance
44
+ 23.3 sec for 3-document audit (61.7x faster than manual). Qwen 2.5 14B Instruct on AMD MI300X via vLLM (307 t/s prompt, 252 t/s generation, 30.4% prefix cache hit rate).
45
+
46
+ Market & Future
47
+ EU professional services market ~$280B TAM, document workflows ~$45B SAM, HU/CEE audit beachhead ~$2B SOM. Roadmap: NAV eAFA integration, fraud detection (Benford's Law), partner risk scoring, human-in-the-loop M2M validation. SaaS revenue ($500-2k/seat/month) + on-prem enterprise for Big Four.
48
+ ```
49
+
50
+ ---
51
+
52
+ ## Extended Reference Material — Long Description Source (NOT for Submission Form)
53
+
54
+ > The 10-section detailed write-up below is the **source material** for the demo video voiceover, the slide deck (`docs/slides/PaperHawk_Slides.pdf`), and the technical walkthrough README. **Do not paste this into the Submission Form** — it would exceed the 2000-char limit several times over. Keep it here as the canonical "what we built" reference.
55
 
56
  ### The Problem
57
 
 
179
  | Project Title | DONE | `PaperHawk` |
180
  | Short Description | DONE | 247 characters, A+C blend |
181
  | Long Description | DONE | 10 sections, builder-energy tone |
182
+ | Cover Image | DONE | `docs/slides/01_cover.png` (1280 × 720, 16:9) |
183
+ | Slide Presentation | DONE | `docs/slides/PaperHawk_Slides.pdf` (10 slides) |
184
  | Technology & Category Tags | DONE | 12 tags |
185
  | Public GitHub Repository | DONE | `github.com/nandorfivince/paperhawk` |
186
+ | Live HF Space — `Vincsipe/paperhawk` (Plan-B) | DONE | Validated end-to-end 2026-05-05 |
187
+ | Live HF Space — `lablab-ai-amd-developer-hackathon/paperhawk` (Plan-A) | BLOCKED | Org-quota issue, ticket pending |
188
+ | Build-in-Public Posts | TODO at posting time | 4 drafts ready in `docs/social-posts/` |
189
+ | Video Presentation | TODO | Demo walkthrough video (max 3 min) |
190
+ | AMD Developer Experience Feedback | DONE | See section below |
191
+
192
+ ---
193
+
194
+ ## Live Deployment Validation (2026-05-05)
195
+
196
+ End-to-end live test of the full stack succeeded on **2026-05-05 reggel** with the following measured results:
197
+
198
+ | Metric | Value |
199
+ |---|---|
200
+ | Audit Demo processing time (3 PDFs) | **23.3 seconds** |
201
+ | Speedup vs manual auditor (24 min estimate) | **61.7×** |
202
+ | vLLM cold-start from snapshot (HF cache preserved) | **~30 seconds** (vs 70 sec clean install) |
203
+ | Prompt throughput | **307 tokens/sec** |
204
+ | Generation throughput | **252 tokens/sec** |
205
+ | Prefix cache hit rate | **30.4%** |
206
+ | Cross-document red flag detected | **57.5% price drift** (78,740 → 124,016 Ft over 3 invoices) |
207
+ | Anti-hallucination quote validator | Caught 4 of 6 hallucinated citations, downgraded confidence |
208
+ | Jurisdictional standards applied | HU VAT Act §169, ISA 500, ISA 320 |
209
+
210
+ The full pipeline ran from a publicly-deployed Hugging Face Space (`Vincsipe/paperhawk`) through to the AMD MI300X vLLM endpoint and back, with all 14 deterministic domain checks executing and the package-level cross-doc analyzer correctly identifying the price-drift red flag without human prompting.
211
+
212
+ **Recorded outputs**: 4 win-screenshots (`Screenshot from 2026-05-05 10-07-{15,22,31,37}.png`) usable in the Submission video and slides.
213
+
214
+ ---
215
+
216
+ ## AMD Developer Experience Feedback
217
+
218
+ Our team had a generally positive experience deploying our agentic document intelligence platform on AMD's stack. Key feedback by component:
219
+
220
+ ### ROCm 7.0
221
+
222
+ The vLLM 0.17.1 + ROCm 7.0 build was stable out of the box on the Quick Start image. Qwen 2.5 14B Instruct loaded in 17.4 sec to MI300X VRAM (27.6 GB model + 141 GB available KV cache), CUDA graph compilation took 20.5 sec, total cold-start ~70 sec. Production-grade throughput: 307 tokens/sec prompt, 252 tokens/sec generation, 30.4% prefix cache hit rate. The OpenAI-compatible REST endpoint at port 8000 worked transparently. We did not need any ROCm-specific code changes from our development setup — vLLM abstracted everything. **Recommendation**: keep the Quick Start vLLM image fresh; it saved us hours of setup.
223
+
224
+ ### AMD Developer Cloud (DigitalOcean-powered)
225
+
226
+ **Strengths**:
227
+
228
+ - $1.99/hour MI300X pricing is fair and predictable
229
+ - The Quick Start vLLM image saved hours of setup (Docker + ROCm + vLLM pre-installed, JupyterLab launched on port 80)
230
+ - 192 GB HBM3 + 141 GB available KV cache — lots of headroom for large-context multi-agent workloads
231
+ - Snapshot-and-destroy workflow excellent for cost control: $0.32/day storage for ~96 GB snapshot, 5-10 min recreate from snapshot, HF model cache preserved inside the Docker container layer means warm restart is ~30 sec instead of cold-start 70 sec
232
+ - Auto-destroy on credit runout (when no payment method) is a built-in safety net we appreciated
233
+ - Free $100 promo credit makes the platform genuinely accessible to hackathon participants
234
+
235
+ **Pain points and UI improvement opportunities**:
236
+
237
+ 1. Sidebar `GPU Droplets` link in the left navigation routes to the CPU Droplet flow (a clear UI bug — workaround is the homepage `Create a GPU Droplet` card or the top-right `Create` dropdown). We hit this twice in our first hour.
238
+ 2. Default region NYC1 was 'out of capacity' for MI300X plan — we had to switch to ATL1 via URL parameter (`?region=atl1`). The region selector on the GPU Droplet creation page does not appear to be exposed in the UI; we found the workaround by inspecting the URL of a successful creation. Adding region availability indicators on the GPU Plan selector would help.
239
+ 3. Reboot after `apt-get upgrade` (recommended via Security notice) does not auto-restart the `rocm` Docker container — needed `docker start rocm` manually. Worth documenting in the Quick Start onboarding.
240
+
241
+ ### AMD APIs
242
+
243
+ We did not use the lower-level ROCm-API or AMD-specific SDKs directly. Our stack was vLLM + OpenAI-compatible REST → all hardware-specific work was abstracted away through standard Python tooling. This is actually a strength: we ran a production-grade paperhawk pipeline (originally developed against Anthropic Claude API) on AMD MI300X with **zero application code changes** — proving the AMD stack via vLLM is a real drop-in alternative for production AI workloads. We changed only environment variables (`LLM_PROFILE`, `VLLM_BASE_URL`, `VLLM_API_KEY`, `VLLM_MODEL`).
244
+
245
+ ### Overall verdict
246
+
247
+ AMD MI300X via the Developer Cloud is a viable production deployment platform for agentic LLM applications. The Quick Start vLLM image is a major time-saver. The few UI bugs and capacity-region issues are minor compared to the platform's strengths. The combination of $1.99/hour MI300X pricing + snapshot-restore workflow + OpenAI-compatible vLLM endpoint makes this a credible alternative to AWS p4d/p5 or GCP A3 for inference workloads, especially at the price point.
248
 
249
  ---
250
 
251
  ## Submission URLs (filled at submission time)
252
 
253
+ ### Plan-A (lablab-org admin reagált) — preferred
254
+
255
  - **GitHub repo**: https://github.com/nandorfivince/paperhawk
256
+ - **Hugging Face Space (official)**: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk
257
+ - **Live application URL**: same as HF Space URL above
258
+ - **Slide deck**: `docs/slides/PaperHawk_Slides.pdf`
259
+ - **Demo video**: *(uploaded at submission time)*
260
+
261
+ ### Plan-B (lablab-org quota unresolved) — fallback
262
+
263
+ - **GitHub repo**: https://github.com/nandorfivince/paperhawk
264
+ - **Hugging Face Space (working, parallel)**: https://huggingface.co/spaces/Vincsipe/paperhawk
265
+ - **Live application URL**: same as HF Space URL above
266
+ - **Slide deck**: `docs/slides/PaperHawk_Slides.pdf`
267
+ - **Demo video**: *(uploaded at submission time)*
268
+
269
+ **Plan-B trade-off**: HF Special Prize (Reachy Mini robot + HF PRO + $500 credits) requires the Space to be under the `lablab-ai-amd-developer-hackathon` org. If we ship under `Vincsipe/paperhawk`, we forfeit the HF Special Prize but retain qualification for the four main judging criteria (Presentation, Business Value, Application of Technology, Originality).
270
 
271
  ---
272
 
docs/hf-space-deployment.md DELETED
@@ -1,124 +0,0 @@
1
- # Hugging Face Space deployment
2
-
3
- The Streamlit app deploys to a **Hugging Face Space** under the
4
- `lablab-ai-amd-developer-hackathon` organization. This is **mandatory** for
5
- the Hugging Face Special Prize and convenient as the public demo URL.
6
-
7
- ## 1. Prerequisites
8
-
9
- - Hugging Face account
10
- - Membership in the **AMD Developer Hackathon** HF organization
11
- ([join here](https://huggingface.co/login?next=%2Forganizations%2Flablab-ai-amd-developer-hackathon%2Fshare%2FELARrxoRIHvseSHRhANJYFEZQazsQIYhJf))
12
- - A running vLLM endpoint on the AMD MI300X (see `qwen-vllm-deployment.md`)
13
-
14
- ## 2. Create the Space
15
-
16
- 1. Hugging Face → Spaces → New Space
17
- 2. Owner: `lablab-ai-amd-developer-hackathon`
18
- 3. Space name: `paperhawk`
19
- 4. License: MIT
20
- 5. SDK: **Streamlit**
21
- 6. Hardware: **CPU basic** (free) — vLLM runs on MI300X, the Space only hosts the UI
22
-
23
- ## 3. Push the code
24
-
25
- ```bash
26
- git remote add space https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk
27
- git push space main
28
- ```
29
-
30
- The Space auto-builds from the repo using `requirements.txt` and runs
31
- `app.py` (or, in our layout, configures Streamlit to start `app/main.py`).
32
-
33
- ## 4. Set Space env vars
34
-
35
- In the Space → Settings → Variables and secrets, add:
36
-
37
- ```
38
- LLM_PROFILE=vllm
39
- VLLM_BASE_URL=http://<mi300x-public-ip>:8000/v1
40
- VLLM_MODEL=Qwen/Qwen2.5-14B-Instruct
41
- VLLM_API_KEY=<the api key you set on the vLLM server>
42
- EMBEDDING_MODEL=BAAI/bge-m3
43
- ```
44
-
45
- Mark `VLLM_API_KEY` as a **secret** (not a regular variable).
46
-
47
- ## 5. Space front-matter
48
-
49
- Edit the `README.md` to start with the HF Spaces front-matter:
50
-
51
- ```yaml
52
- ---
53
- title: Document Intelligence (AMD Edition)
54
- emoji: 🔍
55
- colorFrom: red
56
- colorTo: yellow
57
- sdk: streamlit
58
- sdk_version: 1.40.0
59
- app_file: app/main.py
60
- pinned: false
61
- license: mit
62
- short_description: Multi-document due diligence with LangGraph + Qwen on AMD MI300X
63
- tags:
64
- - langgraph
65
- - agentic
66
- - rag
67
- - qwen
68
- - amd
69
- - document-intelligence
70
- ---
71
- ```
72
-
73
- (The current README.md is the project README; this front-matter goes on top
74
- when the repo is mirrored to the HF Space.)
75
-
76
- ## 6. Verify the Space
77
-
78
- After the build finishes (~3-5 minutes):
79
-
80
- 1. Open `https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/paperhawk`
81
- 2. Click the **Audit Demo** button → it should run end-to-end and produce
82
- risks + a report.
83
- 3. Open the **Chat** tab → ask a question → the answer should include
84
- `[Source: filename.pdf]` citations.
85
-
86
- ## 7. Resource tier
87
-
88
- The free CPU basic tier (16 GB RAM, 2 vCPU) handles:
89
-
90
- - BGE-m3 embedding (~2.3 GB on first load)
91
- - ChromaDB (small index)
92
- - Streamlit UI
93
-
94
- The vLLM model runs on the MI300X, **not** here. The Space just renders the
95
- UI and proxies requests to the vLLM endpoint.
96
-
97
- If the free tier is too tight on memory, upgrade to **CPU upgrade** ($0.03/h).
98
-
99
- ## 8. Sleep mode mitigation
100
-
101
- A free Space sleeps after 48 hours of inactivity. The first request after
102
- sleep takes ~30-60 seconds to wake. Mitigations:
103
-
104
- - Share the Space link in your Build-in-Public posts → continuous traffic →
105
- less likely to sleep.
106
- - Set up a 30-minute external ping (e.g. UptimeRobot) the day before
107
- judging.
108
-
109
- ## 9. The HF Special Prize is like-driven
110
-
111
- Once the Space is live:
112
-
113
- 1. Share the URL on X / LinkedIn (tag `@lablab` and `@AIatAMD`).
114
- 2. Ask your followers to like the Space.
115
- 3. The Space with the most likes at the end of the hackathon wins:
116
- - 1st: Reachy Mini Wireless robot + 6 months HF PRO + $500 HF credit
117
- - 2nd: 3 months HF PRO + $300 credit
118
- - 3rd: 2 months HF PRO + $200 credit
119
-
120
- ## 10. Submission to lablab
121
-
122
- When submitting on lablab.ai, paste the Space URL into the **Application
123
- URL** and **Hugging Face Space link** fields. This is mandatory for the HF
124
- prize qualification.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/qwen-vllm-deployment.md DELETED
@@ -1,68 +0,0 @@
1
- # Qwen on AMD MI300X — vLLM deployment
2
-
3
- This guide covers the production deployment path: running Qwen 2.5 Instruct
4
- (14B or 32B) via [vLLM](https://github.com/vllm-project/vllm) on an
5
- **AMD Instinct MI300X** through the AMD Developer Cloud, with the Streamlit
6
- app calling the vLLM endpoint over the OpenAI-compatible REST API.
7
-
8
- For the canonical step-by-step (including the docker run command and a
9
- benchmark table), see [`infra/vllm/README.md`](../infra/vllm/README.md).
10
-
11
- ## Why this stack?
12
-
13
- - **Open source LLM** — Qwen 2.5 is Apache-2 licensed; safe for the MIT
14
- open-source license here, and a partner-prize bonus on the hackathon.
15
- - **Multilingual** — Qwen 2.5 handles HU/DE/EN well, which matters for our
16
- multilingual demo data.
17
- - **AMD-native** — vLLM has a ROCm build (`rocm/vllm:latest`) optimized for
18
- the MI300X. No CUDA, no NVIDIA dependency.
19
- - **OpenAI-compatible API** — `langchain-openai`'s `ChatOpenAI` adapter
20
- works out of the box with a custom `base_url`. Tool-calling, structured
21
- output, and streaming all behave the same as the public OpenAI endpoint.
22
- - **No vendor lock-in** — the same code runs against Ollama (locally) and
23
- against any OpenAI-compatible inference server.
24
-
25
- ## Cost monitoring
26
-
27
- AMD Developer Cloud pricing (May 2026 ballpark):
28
-
29
- - ~$4-8/hour pay-as-you-go for an MI300X instance.
30
- - Each team member gets `$100` in cloud credits → 60 hours of MI300X uptime
31
- at $5/h. With 3 team members, ~180 hours total.
32
-
33
- **Discipline:**
34
-
35
- 1. Only run during demo / test / build sessions; **stop the instance when
36
- idle**.
37
- 2. Keep one teammate's credit untouched as a final-day buffer.
38
- 3. Run end-to-end smoke tests early — a hot fix on deadline day burns hours
39
- you can't get back.
40
-
41
- ## Plan B: Ollama fallback
42
-
43
- If the AMD credit doesn't arrive in time, or the MI300X has a network issue
44
- on demo day:
45
-
46
- ```bash
47
- LLM_PROFILE=ollama OLLAMA_MODEL=qwen2.5:7b-instruct streamlit run app/main.py
48
- ```
49
-
50
- Pull the model first:
51
-
52
- ```bash
53
- ollama pull qwen2.5:7b-instruct
54
- ```
55
-
56
- Quality drops (7B vs 14B/32B), but the demo flow stays alive on a laptop
57
- GPU or even CPU.
58
-
59
- ## Production hardening (post-hackathon)
60
-
61
- For an actual production deployment beyond the hackathon scope:
62
-
63
- - TLS termination (Caddy / Nginx in front of vLLM)
64
- - API-key rotation (`--api-key` flag with a periodic rotation script)
65
- - Prometheus + Grafana on vLLM `/metrics`
66
- - `--quantization fp8` to fit a larger model on smaller hardware
67
- - `--enable-prefix-caching` for repeated long system prompts
68
- - Multi-GPU / multi-region scaling via SkyPilot or vLLM Production Stack
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/social-posts/post-1-build-window-opens.md DELETED
@@ -1,165 +0,0 @@
1
- # Build in Public · Post 1 — Build Window Opens
2
-
3
- **Timing**: post on or just after the AMD Hackathon kick-off (May 4, 6:00 PM CEST).
4
- **Order**: post on **X first**, then LinkedIn ~30 minutes later.
5
- **Why**: X moves fast, LinkedIn rewards a slightly longer-form follow-up.
6
-
7
- This is the first of three planned Build-in-Public posts:
8
-
9
- 1. **Post 1** (this file) — build window opens · stack-introduction · GitHub link
10
- 2. **Post 2** (mid-week, ~May 7-8) — technical deep-dive on one design choice (LangGraph Send-API parallelism for the deterministic check fan-out)
11
- 3. **Post 3** (May 10, after submit) — final demo · HF Space · pitch-recap
12
-
13
- Mandatory tags ([per the official Build in Public requirement](https://lablab.ai/event/amd-developer-hackathon)):
14
-
15
- | Platform | Required tags |
16
- |---|---|
17
- | X | `@lablab` + `@AIatAMD` |
18
- | LinkedIn | `lablab.ai` + `AMD Developer` (showcase pages) |
19
-
20
- ---
21
-
22
- ## Variant A — X (Twitter)
23
-
24
- > Character budget: 280 — version below uses 269 chars including handles + hashtags.
25
-
26
- ```
27
- Build window opens.
28
-
29
- Putting our LangGraph-native, multi-agent document intelligence
30
- platform on AMD Instinct MI300X for the @AIatAMD x @lablab
31
- hackathon.
32
-
33
- Qwen 2.5 14B on vLLM. 14 deterministic domain checks. 5+1
34
- anti-halluc layers. MIT, public.
35
-
36
- → github.com/nandorfivince/paperhawk
37
-
38
- #AMDHackathon #BuildInPublic
39
- ```
40
-
41
- ### X variant alternatives (in case the first doesn't fit)
42
-
43
- **Punchy / 240 char:**
44
-
45
- ```
46
- PaperHawk — multi-agent document intelligence on @AIatAMD MI300X.
47
-
48
- Qwen 2.5 14B + LangGraph 0.6 + 14 deterministic domain checks.
49
- Build window starts now for the @lablab hackathon.
50
-
51
- Open source · MIT · public repo.
52
-
53
- → github.com/nandorfivince/paperhawk
54
-
55
- #AMDHackathon #BuildInPublic
56
- ```
57
-
58
- **Tech-detail / 270 char:**
59
-
60
- ```
61
- We built PaperHawk: 4 LangGraph graphs, 6 subgraphs, 14
62
- deterministic domain checks, multi-agent DD assistant.
63
-
64
- Now porting it to @AIatAMD Instinct MI300X via vLLM for the
65
- @lablab hackathon.
66
-
67
- Qwen 2.5 14B inside. MIT, public.
68
-
69
- → github.com/nandorfivince/paperhawk
70
-
71
- #AMDHackathon #BuildInPublic
72
- ```
73
-
74
- ---
75
-
76
- ## Variant B — LinkedIn (long form)
77
-
78
- > Character budget: 3000. Version below is ~1280 chars + tags. Reads as a proper builder-energy update for technical recruiters and AI-engineering peers.
79
-
80
- ```
81
- Build window opens.
82
-
83
- For the next week we're putting PaperHawk — our LangGraph-native,
84
- multi-agent document intelligence platform — on AMD Instinct MI300X
85
- GPUs for the AMD Developer Hackathon × lablab.ai.
86
-
87
- The premise is simple: most "document AI" today is RAG with extra
88
- steps. Retrieve a passage, summarize it, hope it's right. That's
89
- fine for FAQ chatbots. It's not fine for auditors, due-diligence
90
- teams, or anyone who has to cross-reference a folder of contracts
91
- and invoices and trust the answer.
92
-
93
- PaperHawk is built for the second case:
94
-
95
- → 4 compiled LangGraph 0.6 graphs (pipeline / chat / DD / package)
96
- → 14 deterministic domain checks (ISA 240/500/320, GDPR Article 28,
97
- Incoterms 2020, AML sanctions)
98
- → 5+1 anti-hallucination layers — every LLM claim must cite a
99
- verbatim quote from the document, or it gets dropped
100
- → 5-tool agentic chat with strict [Source: filename.pdf] citations
101
- → Multi-agent DD assistant: 4 specialists + supervisor + synthesizer
102
-
103
- Stack:
104
- → Qwen 2.5 14B Instruct served via vLLM on AMD MI300X (ROCm)
105
- → BAAI/bge-m3 multilingual embeddings
106
- → Streamlit 5-tab UI, deployable as a Hugging Face Space
107
- → MIT licensed, English-first, multilingual fallback
108
-
109
- Three of us have shipped together for nearly a decade. We're not
110
- new to building things. We're using this hackathon to put our
111
- agentic DI platform on AMD's open compute stack and see how far it
112
- goes.
113
-
114
- We'll be sharing a technical walkthrough mid-week — including why
115
- LangGraph's Send-API parallelism beat sequential domain dispatch in
116
- our benchmarks.
117
-
118
- Repo (public): https://github.com/nandorfivince/paperhawk
119
-
120
- #AMDHackathon #BuildInPublic #LangGraph #Qwen #AMDInstinct #lablab
121
- ```
122
-
123
- **Don't forget**: in the LinkedIn post composer, **tag the company pages**:
124
-
125
- - `lablab.ai` → https://www.linkedin.com/company/lablab-ai/
126
- - `AMD Developer` (showcase page) → https://www.linkedin.com/showcase/amd-developer/
127
-
128
- These appear as `@lablab.ai` and `@AMD Developer` in the post — LinkedIn auto-completes them when you start typing.
129
-
130
- ---
131
-
132
- ## Image / media to attach
133
-
134
- For both X and LinkedIn, attach **one image**: the cover slide from the deck.
135
-
136
- ```bash
137
- # Generate it from slides.html (see docs/slides/README.md for the script):
138
- python -c "<<see docs/slides/README.md cover-PNG snippet>>"
139
- # Output: docs/slides/01_cover.png
140
- ```
141
-
142
- Alternative for X (which compresses heavily): use the `paperhawk.jpeg` directly — it's already wide-format (2048×819) and reads well on mobile.
143
-
144
- ---
145
-
146
- ## Posting checklist
147
-
148
- | Step | Status |
149
- |---|---|
150
- | Cover image generated (`docs/slides/01_cover.png`) | TODO before posting |
151
- | GitHub repo public + README hero visible | DONE |
152
- | `@lablab` + `@AIatAMD` typed correctly on X | TODO at post-time |
153
- | `lablab.ai` + `AMD Developer` company pages tagged on LinkedIn | TODO at post-time |
154
- | Repo URL works in private/incognito browser (sanity-check public visibility) | TODO before posting |
155
- | `#AMDHackathon` `#BuildInPublic` hashtags both included | DONE |
156
-
157
- ---
158
-
159
- ## What this post is NOT
160
-
161
- - Not a marketing pitch. It's a technical announcement.
162
- - Not "we hope to win". It's "we built this, here's what it does, watch this space."
163
- - Not asking for likes. The HF Space is where like-voting happens (different track / different prize).
164
-
165
- The job of this post: **plant a flag**. We're building. We're public. We've shipped together before. Now we're doing it on AMD GPUs.