AMD Developer Hackathon × lablab.ai · May 2026
PaperHawk
Multi-agent document intelligence on AMD Instinct MI300X.
Built by engineers who ship.
The Problem
RAG retrieves.
Audit finds.
Today's RAG chatbots can do the first. They cannot do the second.
What RAG does well
Chunk a document. Embed the chunks. Retrieve top-K passages. Generate an answer with the retrieved context.
Great for FAQ chatbots. Great for Q&A on a single document.
What auditors actually need
"Does the supplier in Invoice #7 match the vendor in PO #3? Is the VAT rate consistent across the package? Any change-of-control clauses? Sanctions hits?"
These questions live in the relationship between documents — not in any single chunk.
What We Built
A multi-agent system.
Not a retrieval pipeline.
LangGraph 0.6-native. Production-shaped. Open source under MIT.
14
Deterministic
domain checks
Send-API parallelism · AsyncSqliteSaver checkpointer · configurable_alternatives provider stack (vLLM / Ollama / dummy) · multi-agent DD assistant with 4 specialists + supervisor + synthesizer · Streamlit 5-tab UI · 61 tests passing in CI without an LLM.
The Pipeline
Five steps. End-to-end.
Every step is a typed Pydantic-state node. Every LLM call has structured output.
1
Ingest
PDF · DOCX · image. Vision-first OCR fallback for scanned pages.
2
Classify
6-way doc-type classifier. ISA 500 evidence-quality score.
3
Extract
Pydantic schema per doc-type. _quotes + _confidence per field.
4
Cross-ref
3-way matching. Package-level analyzer. DD multi-agent.
5
Risk + Report
14 checks (parallel Send) · LLM ensemble · 3-layer filter · DOCX export.
On AMD MI300X with Qwen 2.5 14B: 30–90 seconds end-to-end per package.
Beyond LLMs · Deterministic Reasoning
Fourteen rules. In Python.
Every check is a typed Protocol, not a prompt. Run in parallel via the LangGraph Send API.
Tier A — Audit · 6 checks
- ISA 500 Evidence hierarchy
- ISA 320 Materiality threshold
- ISA 240 Duplicate invoice detector
- ISA 240 Rounded-amount anomaly
- Tax-ID CDV mod-11 checksum
- Mandatory fields Invoice completeness
Tier B — Compliance · 4 checks
- GDPR Art. 28 Sub-processor clause
- AML / Sanctions EU + OFAC fuzzy match
- M&A red flag Change-of-control · auto-renewal
- Disproportionality Penalty-vs-value ratio
Tier C — Standards · 4 checks
- Incoterms 2020 11-rule recognizer
- IFRS / GAAP Goodwill + lease anomaly
- Math validation Net + VAT + gross
- Contract completeness 6-key-clause check
Jurisdiction-aware: locale-specific rules trigger only on locale-tagged inputs. Universal rules run everywhere.
Trust by Design
Anti-halluc 5+1. DD multi-agent.
1
temperature=0 on every LLM call
2
_quotes verbatim source citation
3
_confidence per extracted field
4
Plausibility validators (math · dates · ranges)
5
3-layer LLM-risk filter chain
+1
Quote validator: drops claims whose quotes aren't in the doc
Audit specialist
Legal specialist
Compliance specialist
Financial specialist
↓
Supervisor — routing & coordination
↓
Synthesizer → Executive Summary
Four specialists read the same package independently. The supervisor coordinates routing. The synthesizer writes a 3-paragraph executive brief with cited red flags.
The Stack
Qwen on AMD MI300X via vLLM.
192 GB HBM3. ROCm-native. Open-source models, end-to-end.
Streamlit · 5-tab UI
Upload · Results · Chat · DD · Report
LangGraph 0.6 orchestration
4 graphs · 6 subgraphs · Send API · AsyncSqliteSaver
Qwen 2.5 14B Instruct (open source)
tool-calling · structured-output · multilingual
vLLM continuous batching
--api-key · --max-model-len 32768 · OpenAI-compatible
AMD Instinct MI300X · ROCm
192 GB HBM3 · BF16 / FP8 · AMD Developer Cloud
Hugging Face Spaces deploy
lablab-ai-amd-developer-hackathon · Streamlit SDK
See It In Action
Three one-click demos.
Bundled in the repo. Drivable from the Streamlit Upload tab in 30 seconds.
Audit Demo
Three invoices from the same supplier. The March one is 50% pricier than January and February.
→ ISA 240 over-billing pattern flagged with cited line items.
DD Demo
NDA + service agreement + amendment in an acquisition scenario.
→ Hidden change-of-control + auto-renewal red flags.
Compliance Demo
Two contracts; one is missing GDPR Article 28 sub-processor language.
→ Domain check #8 detects the gap with regulatory citation.
On AMD MI300X with Qwen 2.5 14B Instruct: 30–90 seconds per package · end-to-end · with citations.
Open · Reproducible · Public
Built for builders.
MIT licensed. Reproducible from a clean clone. No closed weights, no proprietary extensions.
/ 01
Open source · MIT
Public GitHub repo. No "training data not included" footnotes. Clone it, run it, fork it. The whole codebase is yours to read.
/ 02
Reproducible
Same stack from laptop to MI300X. infra/vllm/Dockerfile + serve.sh + requirements.txt. One command, one container.
/ 03
Battle-tested
61 tests passing in CI without any LLM. Deterministic dummy provider for CI; vLLM and Ollama for everything else.
github.com/nandorfivince/paperhawk
|
HF Space: lablab-ai-amd-developer-hackathon/paperhawk
|
License: MIT
The Team
Three engineers.
One shipped product.
We've shipped together for nearly a decade. PaperHawk is what happens when domain knowledge, engineering rigor, and product instinct meet on the same codebase.
Lead · LangGraph · AMD Adaptation
Vince Nándorfi
System architecture, domain research, ROCm/vLLM adaptation, testing. PaperHawk's blueprint and the AMD-edition rewrite.
Engineering · DevOps
Tamás Vitai
Senior++ engineer. Implementation, infrastructure, integration testing. Where the code meets the runtime.
Engineering · Algorithms
Gábor Murcsik
Engineering rigor. Algorithmic precision. Senior systems thinking, sharpened over years of complex production builds.
Beyond simple RAG. Built to ship.