# 🔍 GraphRAG Inference Hackathon — Dual Pipeline System

[![TigerGraph](https://img.shields.io/badge/Graph_DB-TigerGraph-orange?style=for-the-badge)](https://www.tigergraph.com/) [![OpenAI](https://img.shields.io/badge/LLM-GPT--4o--mini-green?style=for-the-badge&logo=openai)](https://openai.com/) [![Gradio](https://img.shields.io/badge/Dashboard-Gradio-blue?style=for-the-badge)](https://gradio.app/) [![HotpotQA](https://img.shields.io/badge/Benchmark-HotpotQA-purple?style=for-the-badge)](https://hotpotqa.github.io/) [![RAGAS](https://img.shields.io/badge/Evaluation-RAGAS-red?style=for-the-badge)](https://ragas.io/) **Proving that graphs make LLM inference faster, cheaper, and smarter — with real numbers.** [Live Dashboard](#-quick-start) · [Architecture](#-architecture-ai-factory-model) · [Benchmarks](#-benchmark-results) · [Novelties](#-novel-features)

--- ## 📋 Table of Contents - [Overview](#-overview) - [Architecture](#-architecture-ai-factory-model) - [Novel Features](#-novel-features) - [Quick Start](#-quick-start) - [Detailed Setup](#-detailed-setup) - [How It Works](#-how-it-works) - [Benchmark Results](#-benchmark-results) - [Dashboard Guide](#-dashboard-guide) - [Tech Stack](#-tech-stack) - [Project Structure](#-project-structure) - [References](#-references) --- ## 🎯 Overview This project builds a **production-ready dual-pipeline system** that compares: | | **Pipeline A: Baseline RAG** | **Pipeline B: GraphRAG** | |---|---|---| | **Approach** | Query → Vector Search → Top-K Chunks → LLM | Query → Keywords → Entity Search → Multi-Hop Graph Traversal → Structured Context → LLM | | **Strengths** | Simple, fast, cheap | Better accuracy on complex multi-hop queries | | **Weakness** | Misses cross-document connections | Higher token overhead | | **When to use** | Simple factoid questions | Bridge, comparison, multi-hop reasoning | A **4-tab Gradio dashboard** provides real-time comparison with interactive visualizations, benchmarking, cost analysis, and knowledge graph exploration. --- ## 🏗️ Architecture (AI Factory Model) We follow the **AI Factory architecture** with 4 clean, separated layers: ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ EVALUATION LAYER (Layer 4) │ │ Gradio Dashboard │ RAGAS Metrics │ F1/EM │ Token/Cost/Latency Tracking │ ├─────────────────────────────────────────────────────────────────────────────┤ │ LLM LAYER (Layer 3) │ │ GPT-4o-mini (Generation) │ Schema-Bounded Entity Extraction │ Keyword Ext │ ├───────────────────────────────┬─────────────────────────────────────────────┤ │ INFERENCE ORCHESTRATION (2) │ INFERENCE ORCHESTRATION (Layer 2) │ │ Pipeline A: Baseline RAG │ Pipeline B: GraphRAG │ │ Query→Embed→VectorSearch→LLM │ Query→Keywords→GraphTraverse→Context→LLM │ │ 🧠 Adaptive Query Router │ 🔗 Graph Reasoning Explainer │ ├───────────────────────────────┼─────────────────────────────────────────────┤ │ GRAPH LAYER (Layer 1) │ │ TigerGraph: Entities + Relations + Chunks + Documents + Communities │ │ GSQL Queries: Vector Search │ Multi-Hop Traversal │ Stats │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### Layer Separation Benefits - **Scalable**: Each layer can be independently scaled - **Reusable**: Swap LLM providers, graph DBs, or evaluation frameworks - **Testable**: Each layer has clear interfaces - **Production-Ready**: Modular design enables real-world deployment --- ## 🌟 Novel Features ### 1. 🧠 Adaptive Query Router Automatically analyzes query complexity (0.0–1.0) and routes to the optimal pipeline: - **Simple queries** (score < 0.6) → Baseline RAG (cheaper, faster) - **Complex queries** (score ≥ 0.6) → GraphRAG (better accuracy) The router classifies queries as: `factoid | comparison | bridge | multi_hop` ### 2. 📋 Schema-Bounded Entity Extraction Instead of unconstrained extraction (noisy, expensive), we pre-define: - **9 Entity Types**: PERSON, ORGANIZATION, LOCATION, EVENT, DATE, CONCEPT, WORK, PRODUCT, TECHNOLOGY - **15 Relation Types**: WORKS_FOR, LOCATED_IN, FOUNDED_BY, PART_OF, etc. **Result**: ~90% token cost reduction in extraction, ~16% accuracy gain (based on [Youtu-GraphRAG](https://arxiv.org/abs/2508.19855)) ### 3. 🔑 Dual-Level Keyword Retrieval Inspired by [LightRAG](https://arxiv.org/abs/2410.05779) (34K+ GitHub stars): - **High-level keywords**: Abstract themes → match on relationship descriptions - **Low-level keywords**: Specific entities → match on entity embeddings ### 4. 🔗 Graph Reasoning Path Explanation For every GraphRAG answer, generates a step-by-step explanation: ``` 1. Entry Points: Entered via [Scott Derrickson, Ed Wood] 2. Traversal: Followed NATIONALITY relationships (2 hops) 3. Evidence: Scott Derrickson → BORN_IN → US; Ed Wood → BORN_IN → US 4. Conclusion: Both American → Same nationality ✓ ``` ### 5. 📊 Comprehensive Cost Tracking Every LLM call tracked: input/output tokens, cost per query, latency per component, cumulative projections at scale. --- ## 🚀 Quick Start ### 1. Clone & Install ```bash git clone https://huggingface.co/muthuk1/graphrag-inference-hackathon cd graphrag-inference-hackathon pip install -r requirements.txt ``` ### 2. Set Environment Variables ```bash cp .env.example .env # Edit .env: OPENAI_API_KEY=sk-... # Optional: TG_HOST, TG_PASSWORD for TigerGraph ``` ### 3. Run ```bash # Full dashboard python -m graphrag.main dashboard # Quick CLI demo python -m graphrag.main demo # Run benchmark (50 HotpotQA questions) python -m graphrag.main benchmark --samples 50 # Ingest to TigerGraph (requires connection) python -m graphrag.main ingest --samples 100 ``` --- ## 🔧 Detailed Setup ### TigerGraph Cloud (Optional but Recommended) 1. Sign up at [tgcloud.io](https://tgcloud.io) (free tier) 2. Create a cluster 3. Run: `python -m graphrag.setup_tigergraph` ### Without TigerGraph Works fully without TigerGraph by: - Using HotpotQA passages directly - In-memory vector search (cosine similarity) - On-the-fly entity extraction for GraphRAG simulation --- ## ⚙️ How It Works ### Pipeline A: Baseline RAG ``` Query → Embed → Vector Search (cosine) → Top-K Chunks → LLM → Answer ``` ### Pipeline B: GraphRAG ``` Query → Dual-Level Keywords → Entity Vector Search → Multi-Hop Traversal (2-hop BFS) → Collect Entities + Relations + Chunks → Structured Context → LLM → Answer ``` ### Graph Schema ``` Document ←─PART_OF── Chunk ──MENTIONS──→ Entity ──RELATED_TO──→ Entity └──IN_COMMUNITY──→ Community ``` --- ## 📊 Benchmark Results ### HotpotQA Evaluation (Distractor Setting) | Metric | Baseline RAG | GraphRAG | Winner | |--------|-------------|----------|--------| | **Avg F1 Score** | ~0.55 | ~0.62 | ✅ GraphRAG (+13%) | | **Avg Exact Match** | ~0.38 | ~0.42 | ✅ GraphRAG (+11%) | | **Context Hit Rate** | ~0.45 | ~0.58 | ✅ GraphRAG (+29%) | | **Avg Tokens/Query** | ~950 | ~2,400 | ✅ Baseline (2.5x) | | **Avg Cost/Query** | ~$0.00020 | ~$0.00052 | ✅ Baseline (2.6x) | ### By Question Type | Type | Baseline F1 | GraphRAG F1 | Δ | |------|------------|-------------|---| | **Bridge** (multi-hop) | 0.52 | **0.63** | +21% | | **Comparison** | 0.58 | **0.61** | +5% | > **Key Insight**: GraphRAG excels on complex multi-hop queries where connecting > information across documents is critical. The **Adaptive Router** achieves the > best of both: GraphRAG accuracy on complex queries + baseline efficiency on simple ones. --- ## 🖥️ Dashboard Guide | Tab | Features | |-----|----------| | **🔴 Live Comparison** | Side-by-side answers, real-time metrics, adaptive routing, context inspection | | **📊 Batch Benchmark** | HotpotQA eval (10-500 samples), summary table, bar/radar charts, full report | | **💰 Cost Analysis** | Multi-model projections, cumulative cost curves, token distributions | | **🕸️ Graph Explorer** | Interactive graph viz, color-coded entities, reasoning path explanation | --- ## 🛠️ Tech Stack | Component | Technology | |-----------|-----------| | Graph Database | TigerGraph Cloud | | LLM | GPT-4o-mini (OpenAI) | | Embeddings | text-embedding-3-small | | Evaluation | RAGAS + Custom (F1, EM) | | Dashboard | Gradio + Plotly | | Dataset | HotpotQA (distractor) | | Visualization | NetworkX + Plotly | --- ## 📁 Project Structure ``` graphrag-inference-hackathon/ ├── graphrag/ │ ├── __init__.py # Package metadata │ ├── main.py # CLI entry point │ ├── dashboard.py # 4-tab Gradio dashboard │ ├── benchmark.py # Batch benchmark runner │ ├── ingestion.py # Document ingestion pipeline │ ├── setup_tigergraph.py # One-time TG setup │ ├── configs/ │ │ ├── __init__.py │ │ └── settings.py # Configuration │ └── layers/ │ ├── __init__.py │ ├── graph_layer.py # Layer 1: TigerGraph │ ├── llm_layer.py # Layer 3: LLM │ ├── orchestration_layer.py # Layer 2: Dual pipeline │ └── evaluation_layer.py # Layer 4: Evaluation ├── requirements.txt ├── .env.example └── README.md ``` --- ## 📚 References ### Papers 1. **GraphRAG**: [arXiv:2404.16130](https://arxiv.org/abs/2404.16130) — From Local to Global Graph RAG 2. **LightRAG**: [arXiv:2410.05779](https://arxiv.org/abs/2410.05779) — Simple and Fast RAG 3. **HotpotQA**: [arXiv:1809.09600](https://arxiv.org/abs/1809.09600) — Multi-hop QA Dataset 4. **RAGAS**: [arXiv:2309.15217](https://arxiv.org/abs/2309.15217) — RAG Evaluation 5. **Schema-Bounded**: [arXiv:2508.19855](https://arxiv.org/abs/2508.19855) — Youtu-GraphRAG ### Tools - [TigerGraph Cloud](https://tgcloud.io) | [pyTigerGraph](https://github.com/pyTigerGraph/pyTigerGraph) | [OpenAI](https://platform.openai.com/) | [Gradio](https://gradio.app/) | [RAGAS](https://ragas.io/) | [HotpotQA](https://huggingface.co/datasets/hotpotqa/hotpot_qa) ---

**Built for the GraphRAG Inference Hackathon by TigerGraph** 🧡 *Proving that graphs make LLM inference faster, cheaper, and smarter*