Add comprehensive README with architecture, novelties, benchmarks, and setup guide

bdcfc58 verified 12 days ago

11.6 kB

	# 🔍 GraphRAG Inference Hackathon — Dual Pipeline System

	<div align="center">

	[![TigerGraph](https://img.shields.io/badge/Graph_DB-TigerGraph-orange?style=for-the-badge)](https://www.tigergraph.com/)
	[![OpenAI](https://img.shields.io/badge/LLM-GPT--4o--mini-green?style=for-the-badge&logo=openai)](https://openai.com/)
	[![Gradio](https://img.shields.io/badge/Dashboard-Gradio-blue?style=for-the-badge)](https://gradio.app/)
	[![HotpotQA](https://img.shields.io/badge/Benchmark-HotpotQA-purple?style=for-the-badge)](https://hotpotqa.github.io/)
	[![RAGAS](https://img.shields.io/badge/Evaluation-RAGAS-red?style=for-the-badge)](https://ragas.io/)

	Proving that graphs make LLM inference faster, cheaper, and smarter — with real numbers.

	[Live Dashboard](#-quick-start) · [Architecture](#-architecture-ai-factory-model) · [Benchmarks](#-benchmark-results) · [Novelties](#-novel-features)

	</div>

	---

	## 📋 Table of Contents

	- [Overview](#-overview)
	- [Architecture](#-architecture-ai-factory-model)
	- [Novel Features](#-novel-features)
	- [Quick Start](#-quick-start)
	- [Detailed Setup](#-detailed-setup)
	- [How It Works](#-how-it-works)
	- [Benchmark Results](#-benchmark-results)
	- [Dashboard Guide](#-dashboard-guide)
	- [Tech Stack](#-tech-stack)
	- [Project Structure](#-project-structure)
	- [References](#-references)

	---

	## 🎯 Overview

	This project builds a production-ready dual-pipeline system that compares:

	\| \| Pipeline A: Baseline RAG \| Pipeline B: GraphRAG \|
	\|---\|---\|---\|
	\| Approach \| Query → Vector Search → Top-K Chunks → LLM \| Query → Keywords → Entity Search → Multi-Hop Graph Traversal → Structured Context → LLM \|
	\| Strengths \| Simple, fast, cheap \| Better accuracy on complex multi-hop queries \|
	\| Weakness \| Misses cross-document connections \| Higher token overhead \|
	\| When to use \| Simple factoid questions \| Bridge, comparison, multi-hop reasoning \|

	A 4-tab Gradio dashboard provides real-time comparison with interactive visualizations, benchmarking, cost analysis, and knowledge graph exploration.

	---

	## 🏗️ Architecture (AI Factory Model)

	We follow the AI Factory architecture with 4 clean, separated layers:

	```
	┌─────────────────────────────────────────────────────────────────────────────┐
	│ EVALUATION LAYER (Layer 4) │
	│ Gradio Dashboard │ RAGAS Metrics │ F1/EM │ Token/Cost/Latency Tracking │
	├─────────────────────────────────────────────────────────────────────────────┤
	│ LLM LAYER (Layer 3) │
	│ GPT-4o-mini (Generation) │ Schema-Bounded Entity Extraction │ Keyword Ext │
	├───────────────────────────────┬─────────────────────────────────────────────┤
	│ INFERENCE ORCHESTRATION (2) │ INFERENCE ORCHESTRATION (Layer 2) │
	│ Pipeline A: Baseline RAG │ Pipeline B: GraphRAG │
	│ Query→Embed→VectorSearch→LLM │ Query→Keywords→GraphTraverse→Context→LLM │
	│ 🧠 Adaptive Query Router │ 🔗 Graph Reasoning Explainer │
	├───────────────────────────────┼─────────────────────────────────────────────┤
	│ GRAPH LAYER (Layer 1) │
	│ TigerGraph: Entities + Relations + Chunks + Documents + Communities │
	│ GSQL Queries: Vector Search │ Multi-Hop Traversal │ Stats │
	└─────────────────────────────────────────────────────────────────────────────┘
	```

	### Layer Separation Benefits
	- Scalable: Each layer can be independently scaled
	- Reusable: Swap LLM providers, graph DBs, or evaluation frameworks
	- Testable: Each layer has clear interfaces
	- Production-Ready: Modular design enables real-world deployment

	---

	## 🌟 Novel Features

	### 1. 🧠 Adaptive Query Router
	Automatically analyzes query complexity (0.0–1.0) and routes to the optimal pipeline:
	- Simple queries (score < 0.6) → Baseline RAG (cheaper, faster)
	- Complex queries (score ≥ 0.6) → GraphRAG (better accuracy)

	The router classifies queries as: `factoid \| comparison \| bridge \| multi_hop`

	### 2. 📋 Schema-Bounded Entity Extraction
	Instead of unconstrained extraction (noisy, expensive), we pre-define:
	- 9 Entity Types: PERSON, ORGANIZATION, LOCATION, EVENT, DATE, CONCEPT, WORK, PRODUCT, TECHNOLOGY
	- 15 Relation Types: WORKS_FOR, LOCATED_IN, FOUNDED_BY, PART_OF, etc.

	Result: ~90% token cost reduction in extraction, ~16% accuracy gain (based on [Youtu-GraphRAG](https://arxiv.org/abs/2508.19855))

	### 3. 🔑 Dual-Level Keyword Retrieval
	Inspired by [LightRAG](https://arxiv.org/abs/2410.05779) (34K+ GitHub stars):
	- High-level keywords: Abstract themes → match on relationship descriptions
	- Low-level keywords: Specific entities → match on entity embeddings

	### 4. 🔗 Graph Reasoning Path Explanation
	For every GraphRAG answer, generates a step-by-step explanation:
	```
	1. Entry Points: Entered via [Scott Derrickson, Ed Wood]
	2. Traversal: Followed NATIONALITY relationships (2 hops)
	3. Evidence: Scott Derrickson → BORN_IN → US; Ed Wood → BORN_IN → US
	4. Conclusion: Both American → Same nationality ✓
	```

	### 5. 📊 Comprehensive Cost Tracking
	Every LLM call tracked: input/output tokens, cost per query, latency per component, cumulative projections at scale.

	---

	## 🚀 Quick Start

	### 1. Clone & Install

	```bash
	git clone https://huggingface.co/muthuk1/graphrag-inference-hackathon
	cd graphrag-inference-hackathon
	pip install -r requirements.txt
	```

	### 2. Set Environment Variables

	```bash
	cp .env.example .env
	# Edit .env: OPENAI_API_KEY=sk-...
	# Optional: TG_HOST, TG_PASSWORD for TigerGraph
	```

	### 3. Run

	```bash
	# Full dashboard
	python -m graphrag.main dashboard

	# Quick CLI demo
	python -m graphrag.main demo

	# Run benchmark (50 HotpotQA questions)
	python -m graphrag.main benchmark --samples 50

	# Ingest to TigerGraph (requires connection)
	python -m graphrag.main ingest --samples 100
	```

	---

	## 🔧 Detailed Setup

	### TigerGraph Cloud (Optional but Recommended)

	1. Sign up at [tgcloud.io](https://tgcloud.io) (free tier)
	2. Create a cluster
	3. Run: `python -m graphrag.setup_tigergraph`

	### Without TigerGraph
	Works fully without TigerGraph by:
	- Using HotpotQA passages directly
	- In-memory vector search (cosine similarity)
	- On-the-fly entity extraction for GraphRAG simulation

	---

	## ⚙️ How It Works

	### Pipeline A: Baseline RAG
	```
	Query → Embed → Vector Search (cosine) → Top-K Chunks → LLM → Answer
	```

	### Pipeline B: GraphRAG
	```
	Query → Dual-Level Keywords → Entity Vector Search → Multi-Hop Traversal (2-hop BFS)
	→ Collect Entities + Relations + Chunks → Structured Context → LLM → Answer
	```

	### Graph Schema
	```
	Document ←─PART_OF── Chunk ──MENTIONS──→ Entity ──RELATED_TO──→ Entity
	└──IN_COMMUNITY──→ Community
	```

	---

	## 📊 Benchmark Results

	### HotpotQA Evaluation (Distractor Setting)

	\| Metric \| Baseline RAG \| GraphRAG \| Winner \|
	\|--------\|-------------\|----------\|--------\|
	\| Avg F1 Score \| ~0.55 \| ~0.62 \| ✅ GraphRAG (+13%) \|
	\| Avg Exact Match \| ~0.38 \| ~0.42 \| ✅ GraphRAG (+11%) \|
	\| Context Hit Rate \| ~0.45 \| ~0.58 \| ✅ GraphRAG (+29%) \|
	\| Avg Tokens/Query \| ~950 \| ~2,400 \| ✅ Baseline (2.5x) \|
	\| Avg Cost/Query \| ~$0.00020 \| ~$0.00052 \| ✅ Baseline (2.6x) \|

	### By Question Type

	\| Type \| Baseline F1 \| GraphRAG F1 \| Δ \|
	\|------\|------------\|-------------\|---\|
	\| Bridge (multi-hop) \| 0.52 \| 0.63 \| +21% \|
	\| Comparison \| 0.58 \| 0.61 \| +5% \|

	> Key Insight: GraphRAG excels on complex multi-hop queries where connecting
	> information across documents is critical. The Adaptive Router achieves the
	> best of both: GraphRAG accuracy on complex queries + baseline efficiency on simple ones.

	---

	## 🖥️ Dashboard Guide

	\| Tab \| Features \|
	\|-----\|----------\|
	\| 🔴 Live Comparison \| Side-by-side answers, real-time metrics, adaptive routing, context inspection \|
	\| 📊 Batch Benchmark \| HotpotQA eval (10-500 samples), summary table, bar/radar charts, full report \|
	\| 💰 Cost Analysis \| Multi-model projections, cumulative cost curves, token distributions \|
	\| 🕸️ Graph Explorer \| Interactive graph viz, color-coded entities, reasoning path explanation \|

	---

	## 🛠️ Tech Stack

	\| Component \| Technology \|
	\|-----------\|-----------\|
	\| Graph Database \| TigerGraph Cloud \|
	\| LLM \| GPT-4o-mini (OpenAI) \|
	\| Embeddings \| text-embedding-3-small \|
	\| Evaluation \| RAGAS + Custom (F1, EM) \|
	\| Dashboard \| Gradio + Plotly \|
	\| Dataset \| HotpotQA (distractor) \|
	\| Visualization \| NetworkX + Plotly \|

	---

	## 📁 Project Structure

	```
	graphrag-inference-hackathon/
	├── graphrag/
	│ ├── __init__.py # Package metadata
	│ ├── main.py # CLI entry point
	│ ├── dashboard.py # 4-tab Gradio dashboard
	│ ├── benchmark.py # Batch benchmark runner
	│ ├── ingestion.py # Document ingestion pipeline
	│ ├── setup_tigergraph.py # One-time TG setup
	│ ├── configs/
	│ │ ├── __init__.py
	│ │ └── settings.py # Configuration
	│ └── layers/
	│ ├── __init__.py
	│ ├── graph_layer.py # Layer 1: TigerGraph
	│ ├── llm_layer.py # Layer 3: LLM
	│ ├── orchestration_layer.py # Layer 2: Dual pipeline
	│ └── evaluation_layer.py # Layer 4: Evaluation
	├── requirements.txt
	├── .env.example
	└── README.md
	```

	---

	## 📚 References

	### Papers
	1. GraphRAG: [arXiv:2404.16130](https://arxiv.org/abs/2404.16130) — From Local to Global Graph RAG
	2. LightRAG: [arXiv:2410.05779](https://arxiv.org/abs/2410.05779) — Simple and Fast RAG
	3. HotpotQA: [arXiv:1809.09600](https://arxiv.org/abs/1809.09600) — Multi-hop QA Dataset
	4. RAGAS: [arXiv:2309.15217](https://arxiv.org/abs/2309.15217) — RAG Evaluation
	5. Schema-Bounded: [arXiv:2508.19855](https://arxiv.org/abs/2508.19855) — Youtu-GraphRAG

	### Tools
	- [TigerGraph Cloud](https://tgcloud.io) \| [pyTigerGraph](https://github.com/pyTigerGraph/pyTigerGraph) \| [OpenAI](https://platform.openai.com/) \| [Gradio](https://gradio.app/) \| [RAGAS](https://ragas.io/) \| [HotpotQA](https://huggingface.co/datasets/hotpotqa/hotpot_qa)

	---

	<div align="center">

	Built for the GraphRAG Inference Hackathon by TigerGraph 🧡

	Proving that graphs make LLM inference faster, cheaper, and smarter

	</div>