--- title: PaperHawk emoji: 🦅 colorFrom: red colorTo: yellow sdk: docker pinned: false license: mit short_description: Real-DI-Audit/14 rules/6 anti-halluc/LangGraph/Qwen/MI300X ---
Agentic document intelligence on AMD MI300X
Multi-document due diligence with deterministic compliance rules and a 6-layer anti-hallucination stack.
Built for the AMD Developer Hackathon × lablab.ai (May 2026).
--- ## What is PaperHawk? PaperHawk is an **agentic multi-document intelligence platform** for auditors, lawyers, tax advisors, and DD analysts. It processes 3–50 PDFs simultaneously and detects **cross-document red flags humans miss** — like a 57.5% price drift across three invoices from the same supplier — using a multi-agent LangGraph orchestration on top of Qwen 2.5 14B Instruct served via vLLM on AMD Instinct MI300X. It is **not** a chatbot. It is a typed-state, multi-graph reasoning system with deterministic compliance rules, verbatim source citations, and a quote validator that catches LLM hallucinations before they reach the user. ## Why it matters A senior auditor needs ~8 hours to thoroughly review a 50-page invoice/contract package. ChatGPT, Copilot, and Harvey handle one document at a time, hallucinate citations, and lack jurisdiction-specific compliance knowledge. PaperHawk handles the entire package, applies 14 statutory rules hand-coded in Python, and finishes a 3-document audit in **23.3 seconds** (61.7× faster than manual review) — with auditor-grade citations and ISA/GDPR/HU-VAT mappings. --- ## Technical highlights - **Multi-agent LangGraph 0.6 orchestration** — 4 compiled graphs (pipeline, chat, DD, package_insights) + 6 reusable subgraphs with Send-API parallelism - **5-tool agentic chat** with strict `[Source: filename.pdf]` citations validated by a post-processor (no provenance → no answer) - **6-layer anti-hallucination stack** — `temperature=0`, verbatim source quotes, field-level confidence, plausibility validators, 3-stage LLM-risk filter chain, quote validator - **Provider abstraction** with `configurable_alternatives` — vLLM (production) / Ollama (local dev) / dummy (CI) — swap with one env var, zero code changes - **AMD Instinct MI300X via vLLM** — 192 GB HBM3, 27.6 GB model + 141 GB available KV cache, 307 t/s prompt + 252 t/s generation, 30.4% prefix cache hit rate - **61.7× speedup** vs manual audit on a 3-document package (23.3 sec vs ~24 min) - **Hugging Face Space deployable** with Docker SDK + Git LFS for binary assets ## Domain highlights - **14 deterministic statutory rules** hand-coded in Python (NOT prompt-engineered) — ISA 240/320/500 audit standards, HU VAT Act §169 mandatory invoice elements, Ptk. 6:98 disproportionate penalty clauses, Art. 22 tax-ID validation, GDPR Article 28 sub-processor language, Incoterms 2020, AML sanctions list (EU/OFAC fuzzy match) - **Cross-document red flag detection** — three-way matching (invoice + delivery note + PO), package-level pricing anomalies, duplicate-invoice detection (ISA 240), change-of-control trigger detection (M&A DD) - **Multi-agent DD assistant** — 4 specialists (audit / legal / compliance / financial) coordinated by a supervisor and a synthesizer for executive summaries - **Auditor-grade citations** — every finding maps to a regulation source (HU VAT Act §169, ISA 500, GDPR Art. 28, etc.) with verbatim source quote - **Multilingual ingest** — EN / HU / DE OCR via Tesseract, native PDF + DOCX, vision-first scanned-PDF fallback --- ## Try the live demo **Public Hugging Face Space** (no signup, runs in browser): →