AMD Developer Hackathon × lablab.ai · May 2026

PaperHawk

Multi-agent document intelligence on AMD Instinct MI300X.
Built by engineers who ship.

Vince Nándorfi Tamás Vitai Gábor Murcsik

Team CsimpiCsirkek · MIT

The Problem

RAG retrieves.
Audit finds.

Today's RAG chatbots can do the first. They cannot do the second.

What RAG does well

Chunk a document. Embed the chunks. Retrieve top-K passages. Generate an answer with the retrieved context.

Great for FAQ chatbots. Great for Q&A on a single document.

What auditors actually need

"Does the supplier in Invoice #7 match the vendor in PO #3? Is the VAT rate consistent across the package? Any change-of-control clauses? Sanctions hits?"

These questions live in the relationship between documents — not in any single chunk.

What We Built

A multi-agent system.
Not a retrieval pipeline.

LangGraph 0.6-native. Production-shaped. Open source under MIT.

4

Compiled
graphs

6

Reusable
subgraphs

14

Deterministic
domain checks

5+1

Anti-halluc
layers

5

Agentic
chat tools

Send-API parallelism · AsyncSqliteSaver checkpointer · configurable_alternatives provider stack (vLLM / Ollama / dummy) · multi-agent DD assistant with 4 specialists + supervisor + synthesizer · Streamlit 5-tab UI · 61 tests passing in CI without an LLM.

The Pipeline

Five steps. End-to-end.

Every step is a typed Pydantic-state node. Every LLM call has structured output.

1

Ingest

PDF · DOCX · image. Vision-first OCR fallback for scanned pages.

2

Classify

6-way doc-type classifier. ISA 500 evidence-quality score.

3

Extract

Pydantic schema per doc-type. _quotes + _confidence per field.

4

Cross-ref

3-way matching. Package-level analyzer. DD multi-agent.

5

Risk + Report

14 checks (parallel Send) · LLM ensemble · 3-layer filter · DOCX export.

On AMD MI300X with Qwen 2.5 14B: 30–90 seconds end-to-end per package.

Beyond LLMs · Deterministic Reasoning

Fourteen rules. In Python.

Every check is a typed Protocol, not a prompt. Run in parallel via the LangGraph Send API.

Tier A — Audit · 6 checks

ISA 500 Evidence hierarchy
ISA 320 Materiality threshold
ISA 240 Duplicate invoice detector
ISA 240 Rounded-amount anomaly
Tax-ID CDV mod-11 checksum
Mandatory fields Invoice completeness

Tier B — Compliance · 4 checks

GDPR Art. 28 Sub-processor clause
AML / Sanctions EU + OFAC fuzzy match
M&A red flag Change-of-control · auto-renewal
Disproportionality Penalty-vs-value ratio

Tier C — Standards · 4 checks

Incoterms 2020 11-rule recognizer
IFRS / GAAP Goodwill + lease anomaly
Math validation Net + VAT + gross
Contract completeness 6-key-clause check

Jurisdiction-aware: locale-specific rules trigger only on locale-tagged inputs. Universal rules run everywhere.

Trust by Design

Anti-halluc 5+1. DD multi-agent.

5+1 layers, every output

1

temperature=0 on every LLM call

2

_quotes verbatim source citation

3

_confidence per extracted field

4

Plausibility validators (math · dates · ranges)

5

3-layer LLM-risk filter chain

+1

Quote validator: drops claims whose quotes aren't in the doc

DD supervisor pattern

Audit specialist

Legal specialist

Compliance specialist

Financial specialist

↓

Supervisor — routing & coordination

↓

Synthesizer → Executive Summary

Four specialists read the same package independently. The supervisor coordinates routing. The synthesizer writes a 3-paragraph executive brief with cited red flags.

The Stack

Qwen on AMD MI300X via vLLM.

192 GB HBM3. ROCm-native. Open-source models, end-to-end.

Streamlit · 5-tab UI

Upload · Results · Chat · DD · Report

LangGraph 0.6 orchestration

4 graphs · 6 subgraphs · Send API · AsyncSqliteSaver

Qwen 2.5 14B Instruct (open source)

tool-calling · structured-output · multilingual

vLLM continuous batching

--api-key · --max-model-len 32768 · OpenAI-compatible

AMD Instinct MI300X · ROCm

192 GB HBM3 · BF16 / FP8 · AMD Developer Cloud

Hugging Face Spaces deploy

lablab-ai-amd-developer-hackathon · Streamlit SDK

See It In Action

Three one-click demos.

Bundled in the repo. Drivable from the Streamlit Upload tab in 30 seconds.

Audit Demo

Three invoices from the same supplier. The March one is 50% pricier than January and February.

→ ISA 240 over-billing pattern flagged with cited line items.

DD Demo

NDA + service agreement + amendment in an acquisition scenario.

→ Hidden change-of-control + auto-renewal red flags.

Compliance Demo

Two contracts; one is missing GDPR Article 28 sub-processor language.

→ Domain check #8 detects the gap with regulatory citation.

Open · Reproducible · Public

Built for builders.

MIT licensed. Reproducible from a clean clone. No closed weights, no proprietary extensions.

/ 01

Open source · MIT

Public GitHub repo. No "training data not included" footnotes. Clone it, run it, fork it. The whole codebase is yours to read.

/ 02

Reproducible

Same stack from laptop to MI300X. infra/vllm/Dockerfile + serve.sh + requirements.txt. One command, one container.

/ 03

Battle-tested

61 tests passing in CI without any LLM. Deterministic dummy provider for CI; vLLM and Ollama for everything else.

github.com/nandorfivince/paperhawk | HF Space: lablab-ai-amd-developer-hackathon/paperhawk | License: MIT

The Team

Three engineers.
One shipped product.

We've shipped together for nearly a decade. PaperHawk is what happens when domain knowledge, engineering rigor, and product instinct meet on the same codebase.

Lead · LangGraph · AMD Adaptation

Vince Nándorfi

System architecture, domain research, ROCm/vLLM adaptation, testing. PaperHawk's blueprint and the AMD-edition rewrite.

Engineering · DevOps

Tamás Vitai

Senior++ engineer. Implementation, infrastructure, integration testing. Where the code meets the runtime.

Engineering · Algorithms

Gábor Murcsik

Engineering rigor. Algorithmic precision. Senior systems thinking, sharpened over years of complex production builds.

Beyond simple RAG. Built to ship.