Spaces:

Vincsipe
/

paperhawk

Running

App Files Files Community

paperhawk / README.md

Nándorfi Vince

test

5f7b409 1 day ago

preview code

raw

history blame contribute delete

7.57 kB

metadata

title: PaperHawk
emoji: 🦅
colorFrom: red
colorTo: yellow
sdk: docker
pinned: false
license: mit
short_description: Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X

PaperHawk

Agentic document intelligence on AMD MI300X
Multi-document due diligence with deterministic compliance rules and a 6-layer anti-hallucination stack.

Built for the AMD Developer Hackathon × lablab.ai (May 2026).

What is PaperHawk?

PaperHawk is an agentic multi-document intelligence platform for auditors, lawyers, tax advisors, and DD analysts. It processes 3–50 PDFs simultaneously and detects cross-document red flags humans miss — like a 57.5% price drift across three invoices from the same supplier — using a multi-agent LangGraph orchestration on top of Qwen 2.5 14B Instruct served via vLLM on AMD Instinct MI300X.

It is not a chatbot. It is a typed-state, multi-graph reasoning system with deterministic compliance rules, verbatim source citations, and a quote validator that catches LLM hallucinations before they reach the user.

Why it matters

A senior auditor needs ~8 hours to thoroughly review a 50-page invoice/contract package. ChatGPT, Copilot, and Harvey handle one document at a time, hallucinate citations, and lack jurisdiction-specific compliance knowledge. PaperHawk handles the entire package, applies 14 statutory rules hand-coded in Python, and finishes a 3-document audit in 23.3 seconds (61.7× faster than manual review) — with auditor-grade citations and ISA/GDPR/HU-VAT mappings.

Technical highlights

Multi-agent LangGraph 0.6 orchestration — 4 compiled graphs (pipeline, chat, DD, package_insights) + 6 reusable subgraphs with Send-API parallelism
5-tool agentic chat with strict [Source: filename.pdf] citations validated by a post-processor (no provenance → no answer)
6-layer anti-hallucination stack — temperature=0, verbatim source quotes, field-level confidence, plausibility validators, 3-stage LLM-risk filter chain, quote validator
Provider abstraction with configurable_alternatives — vLLM (production) / Ollama (local dev) / dummy (CI) — swap with one env var, zero code changes
AMD Instinct MI300X via vLLM — 192 GB HBM3, 27.6 GB model + 141 GB available KV cache, 307 t/s prompt + 252 t/s generation, 30.4% prefix cache hit rate
61.7× speedup vs manual audit on a 3-document package (23.3 sec vs ~24 min)
Hugging Face Space deployable with Docker SDK + Git LFS for binary assets

Domain highlights

14 deterministic statutory rules hand-coded in Python (NOT prompt-engineered) — ISA 240/320/500 audit standards, HU VAT Act §169 mandatory invoice elements, Ptk. 6:98 disproportionate penalty clauses, Art. 22 tax-ID validation, GDPR Article 28 sub-processor language, Incoterms 2020, AML sanctions list (EU/OFAC fuzzy match)
Cross-document red flag detection — three-way matching (invoice + delivery note + PO), package-level pricing anomalies, duplicate-invoice detection (ISA 240), change-of-control trigger detection (M&A DD)
Multi-agent DD assistant — 4 specialists (audit / legal / compliance / financial) coordinated by a supervisor and a synthesizer for executive summaries
Auditor-grade citations — every finding maps to a regulation source (HU VAT Act §169, ISA 500, GDPR Art. 28, etc.) with verbatim source quote
Multilingual ingest — EN / HU / DE OCR via Tesseract, native PDF + DOCX, vision-first scanned-PDF fallback

Try the live demo

Public Hugging Face Space (no signup, runs in browser):

→ https://huggingface.co/spaces/Vincsipe/paperhawk

Click Audit Demo in the Quick demo section. Three pre-bundled invoices process in ~25 seconds and you'll see the cross-doc 57.5% price drift flag, the 14 deterministic checks, and the auditor-grade citations.

Backed by an AMD MI300X vLLM endpoint serving Qwen 2.5 14B Instruct.

Run it locally

Two options depending on whether you have a GPU or just want a quick smoke test.

Quick demo (~3 minutes, no GPU needed)

Uses the deterministic dummy provider — runs the full pipeline, all 14 domain checks, and the multi-agent orchestration without any LLM calls. Good for verifying the system runs end-to-end.

git clone https://github.com/nandorfivince/paperhawk
cd paperhawk
make install
LLM_PROFILE=dummy make dev

Open http://localhost:8501 → Audit Demo button. Result in ~5 seconds (dummy provider returns deterministic test data).

Full demo (~10 minutes, ~16 GB VRAM recommended)

Uses Ollama with Qwen 2.5 14B Instruct (the same model we deployed to AMD MI300X via vLLM). On a consumer GPU like NVIDIA RTX 4090 / RTX PRO 4500 (32 GB VRAM) you'll see real, production-grade multi-agent reasoning.

git clone https://github.com/nandorfivince/paperhawk
cd paperhawk
make install

# Pull the model (one-time, ~9 GB download)
ollama pull qwen2.5:14b-instruct

# Run the app pointed at Ollama
LLM_PROFILE=ollama OLLAMA_MODEL=qwen2.5:14b-instruct \
  streamlit run app/main.py --server.port=8501 --server.fileWatcherType=none

Open http://localhost:8501 → Audit Demo button.

Expected results on an RTX PRO 4500 (32 GB GDDR7):

Audit Demo: ~80 seconds for 3 invoices, 17.5× speedup vs manual
8 risk findings (2 HIGH, 4 MEDIUM, 2 LOW), HU VAT Act §169 mappings
Cross-doc package-level analyzer flags the 57.5% price-drift red flag
Quote validator catches 4 of 6 hallucinated citations and downgrades them to low confidence

(On AMD MI300X via vLLM: ~23 seconds, 61.7× speedup. 5× faster than Ollama on consumer GPU.)

Docker compose (alternative)

make run-local

Spins up the Streamlit app + Ollama in containers. First run pulls the model (~9 GB).

Documentation

Document	What it covers
`docs/ARCHITECTURE.md`	LangGraph multi-graph design, 14 domain checks, anti-hallucination stack, multi-agent DD
`docs/AMD_DEPLOYMENT.md`	How we deployed Qwen 2.5 14B via vLLM on AMD Instinct MI300X (DigitalOcean-powered AMD Developer Cloud)
`docs/HUGGINGFACE_DEPLOYMENT.md`	How we deployed the Streamlit app as a public Hugging Face Space

For the full submission brief with TAM/SAM, competitor analysis, and the live deployment validation results, see docs/SUBMISSION.md.

License

MIT — see LICENSE. Use, fork, deploy commercially or non-commercially.

Built by

Team csimpicsirkek (PÁKÁK the AI warriors! on the lablab.ai platform):

Vince Nándorfi — lead, LangGraph architecture, AMD adaptation
Erika Nagy — silent partner
Tamás Vitai
Gábor Murcsik

For the AMD Developer Hackathon × lablab.ai, May 2026.