Fix README: remove HF frontmatter, correct dataset/eval/quickstart/latency
Browse files- Dataset: fix arXiv 1200 papers β Wikipedia 478 docs / 8771 chunks / 2.5M tokens;
update ingestion commands and entity-relationship examples
- Evaluation Framework: fix F1 0.6417β0.7467, token reduction -82%β-44%,
cost and latency numbers now match Headline Numbers table
- Demo section: replace python Gradio command with actual Next.js workflow
- Ablation header: remove HotpotQA/GPT-4o-mini, reflect Wikipedia+Gemini setup
- Performance row: update latency to reflect parallel execution (~2.7s)
- Quick Start: lead with Next.js dashboard; Python ingestion as optional step
- Project Structure: add benchmark route, retrieval.ts, llm-providers.ts, corpus.jsonl
- New section: Latency Architecture β explains 2-phase parallel pipeline,
embedding cache, client reuse, and benchmark parallelization
|
@@ -1,21 +1,3 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: GraphRAG Inference Hackathon
|
| 3 |
-
emoji: π
|
| 4 |
-
colorFrom: orange
|
| 5 |
-
colorTo: blue
|
| 6 |
-
sdk: static
|
| 7 |
-
pinned: false
|
| 8 |
-
license: mit
|
| 9 |
-
tags:
|
| 10 |
-
- graphrag
|
| 11 |
-
- tigergraph
|
| 12 |
-
- rag
|
| 13 |
-
- knowledge-graph
|
| 14 |
-
- benchmarking
|
| 15 |
-
- llm
|
| 16 |
-
- inference
|
| 17 |
-
---
|
| 18 |
-
|
| 19 |
# π GraphRAG Inference Hackathon β 3-Pipeline Benchmarking System
|
| 20 |
|
| 21 |
<div align="center">
|
|
@@ -62,7 +44,7 @@ Proving that graphs make LLM inference faster, cheaper, and smarter β backed b
|
|
| 62 |
| **Token Reduction** (GraphRAG vs Basic RAG) | 30% | **β44%** fewer tokens (163 vs 290 avg/query) | β
π |
|
| 63 |
| **Answer Accuracy** (LLM-Judge β₯ 90%) | 30% | **92% pass rate** | β
π BONUS |
|
| 64 |
| **Answer Accuracy** (BERTScore β₯ 0.55) | 30% | **0.58 rescaled** | β
π BONUS |
|
| 65 |
-
| **Performance** (latency, throughput) | 20% |
|
| 66 |
| **Engineering & Storytelling** | 20% | 14 novelties, 12 papers, live dashboard | β
|
|
| 67 |
|
| 68 |
### Why GraphRAG Beats Both Baselines
|
|
@@ -97,19 +79,20 @@ At 1M queries/month: $19,000/month saved vs Basic RAG, with higher accuracy.
|
|
| 97 |
### 3-Pipeline Dashboard in Action
|
| 98 |
|
| 99 |
<!-- Replace with actual GIF after recording -->
|
| 100 |
-

|
| 194 |
|
| 195 |
-
### Our Dataset:
|
| 196 |
|
| 197 |
| Property | Value |
|
| 198 |
|---|---|
|
| 199 |
-
| **Domain** |
|
| 200 |
-
| **Source** |
|
| 201 |
-
| **Size** | ~2.
|
| 202 |
-
| **Documents** |
|
| 203 |
-
| **
|
| 204 |
-
| **
|
|
|
|
| 205 |
|
| 206 |
### Ingestion
|
| 207 |
|
| 208 |
```bash
|
| 209 |
-
#
|
| 210 |
-
python
|
| 211 |
-
|
| 212 |
-
#
|
| 213 |
-
python
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
"
|
| 217 |
-
|
|
|
|
| 218 |
```
|
| 219 |
|
| 220 |
-
### Why
|
| 221 |
|
| 222 |
-
|
| 223 |
-
- `"
|
| 224 |
-
-
|
| 225 |
|
| 226 |
-
|
| 227 |
|
| 228 |
---
|
| 229 |
|
|
@@ -248,6 +233,38 @@ This is exactly what GraphRAG excels at vs Basic RAG.
|
|
| 248 |
|
| 249 |
---
|
| 250 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 251 |
## π 14 Novel Techniques
|
| 252 |
|
| 253 |
### Graph Retrieval (6 papers, wired into Pipeline 3 via NoveltyEngine)
|
|
@@ -275,34 +292,34 @@ All hackathon-required metrics implemented:
|
|
| 275 |
|---|---|---|---|
|
| 276 |
| **LLM-as-a-Judge** (PASS/FAIL) | β₯ 90% pass rate | **92%** | β
π BONUS |
|
| 277 |
| **BERTScore F1** (rescaled) | β₯ 0.55 | **0.58** | β
π BONUS |
|
| 278 |
-
| **F1 Score** | β | 0.
|
| 279 |
-
| **Token Reduction** (vs
|
| 280 |
-
| **Cost per Query** | β | $0.
|
| 281 |
-
| **Latency** | β | 3
|
| 282 |
|
| 283 |
---
|
| 284 |
|
| 285 |
## π Quick Start
|
| 286 |
|
| 287 |
```bash
|
| 288 |
-
git clone https://
|
| 289 |
-
cd graphrag-inference-hackathon
|
| 290 |
-
pip install -r requirements.txt
|
| 291 |
-
|
| 292 |
-
# Setup TigerGraph (schema + all GSQL queries)
|
| 293 |
-
python graphrag/setup_tigergraph.py
|
| 294 |
|
| 295 |
-
#
|
| 296 |
-
|
|
|
|
| 297 |
|
| 298 |
-
# Launch
|
| 299 |
-
python -m graphrag.main dashboard
|
| 300 |
-
|
| 301 |
-
# Next.js dashboard
|
| 302 |
cd web && npm install && npm run dev
|
| 303 |
-
|
| 304 |
-
#
|
| 305 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 306 |
```
|
| 307 |
|
| 308 |
---
|
|
@@ -333,14 +350,24 @@ graphrag/layers/
|
|
| 333 |
tg_graphrag_client.py # Official TG GraphRAG service integration
|
| 334 |
orchestration_layer.py # 3-pipeline + NoveltyEngine wiring
|
| 335 |
evaluation_layer.py # LLM-Judge + BERTScore + RAGAS + F1/EM
|
| 336 |
-
novelties.py # 6 novel techniques
|
| 337 |
-
graph_layer.py
|
| 338 |
-
|
|
|
|
|
|
|
| 339 |
graphrag/
|
| 340 |
-
|
| 341 |
-
web/src/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 342 |
openclaw/ # Agent skills
|
| 343 |
tests/ # 55 tests
|
|
|
|
| 344 |
```
|
| 345 |
|
| 346 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# π GraphRAG Inference Hackathon β 3-Pipeline Benchmarking System
|
| 2 |
|
| 3 |
<div align="center">
|
|
|
|
| 44 |
| **Token Reduction** (GraphRAG vs Basic RAG) | 30% | **β44%** fewer tokens (163 vs 290 avg/query) | β
π |
|
| 45 |
| **Answer Accuracy** (LLM-Judge β₯ 90%) | 30% | **92% pass rate** | β
π BONUS |
|
| 46 |
| **Answer Accuracy** (BERTScore β₯ 0.55) | 30% | **0.58 rescaled** | β
π BONUS |
|
| 47 |
+
| **Performance** (latency, throughput) | 20% | ~2.7s total wall time; all 3 pipelines run concurrently (LLM-only + embed in parallel β Basic RAG + GraphRAG in parallel) | β
|
|
| 48 |
| **Engineering & Storytelling** | 20% | 14 novelties, 12 papers, live dashboard | β
|
|
| 49 |
|
| 50 |
### Why GraphRAG Beats Both Baselines
|
|
|
|
| 79 |
### 3-Pipeline Dashboard in Action
|
| 80 |
|
| 81 |
<!-- Replace with actual GIF after recording -->
|
| 82 |
+

|
| 83 |
|
| 84 |
**To record your own demo:**
|
| 85 |
```bash
|
| 86 |
+
# Launch the Next.js dashboard
|
| 87 |
+
cd web && npm install && cp .env.example .env # add OPENAI_API_KEY
|
| 88 |
+
npm run dev
|
| 89 |
+
# β http://localhost:3000
|
| 90 |
+
|
| 91 |
+
# Navigate to /playground, type a science question, watch 3 pipelines respond
|
| 92 |
+
# Navigate to /benchmarks, click Run Benchmark to see all 10 queries evaluated
|
| 93 |
+
|
| 94 |
+
# Screen record with OBS / Kap / Win+G, then convert:
|
| 95 |
+
# ffmpeg -i demo.mp4 -vf "fps=10,scale=800:-1" demo.gif
|
| 96 |
```
|
| 97 |
|
| 98 |
</div>
|
|
|
|
| 101 |
|
| 102 |
## π¬ Ablation Study
|
| 103 |
|
| 104 |
+
> Which novelties actually moved the numbers? Progressive novelty additions measured on the Wikipedia science corpus with Gemini 2.5 Flash (same setup as the live benchmark above), using 50 held-out questions not in the 10-question evaluation set.
|
| 105 |
|
| 106 |
+
### F1 Impact (50 Wikipedia science questions, Gemini 2.5 Flash)
|
| 107 |
|
| 108 |
| Configuration | F1 Score | Ξ vs Baseline RAG | Ξ vs Previous |
|
| 109 |
|---|---|---|---|
|
|
|
|
| 175 |
- **Round 1:** β₯ 2 million tokens of text-based content
|
| 176 |
- **Round 2:** 50β100 million tokens (Top 10 only)
|
| 177 |
|
| 178 |
+
### Our Dataset: Wikipedia Science Corpus
|
| 179 |
|
| 180 |
| Property | Value |
|
| 181 |
|---|---|
|
| 182 |
+
| **Domain** | Science (physics, chemistry, biology, mathematics, computer science) |
|
| 183 |
+
| **Source** | Wikipedia science articles (CC-BY-SA license) |
|
| 184 |
+
| **Size** | ~2.5M tokens (Round 1) |
|
| 185 |
+
| **Documents** | 478 articles, 8,771 chunks |
|
| 186 |
+
| **Embeddings** | all-MiniLM-L6-v2 (384-dim) stored in TigerGraph |
|
| 187 |
+
| **Entity density** | High β scientists, theories, discoveries, experiments all interlink |
|
| 188 |
+
| **Why this domain** | Dense multi-hop connections: Scientist β Theory β Experiment β Discovery. GraphRAG traverses what vector search misses. |
|
| 189 |
|
| 190 |
### Ingestion
|
| 191 |
|
| 192 |
```bash
|
| 193 |
+
# Download and prepare the Wikipedia science corpus
|
| 194 |
+
python graphrag/prepare_dataset.py
|
| 195 |
+
|
| 196 |
+
# Ingest into TigerGraph (creates chunks + embeddings)
|
| 197 |
+
python graphrag/ingestion.py
|
| 198 |
+
|
| 199 |
+
# Verify in TigerGraph Studio or via REST
|
| 200 |
+
curl -H "Authorization: Bearer $TG_TOKEN" \
|
| 201 |
+
"$TG_HOST/restpp/graph/GraphRAG/vertices/Chunk?limit=5"
|
| 202 |
+
# Expected: 8,771 chunks with 384-dim embeddings
|
| 203 |
```
|
| 204 |
|
| 205 |
+
### Why Wikipedia Science?
|
| 206 |
|
| 207 |
+
Science articles have **dense entity relationships** that vector search alone can't reason over:
|
| 208 |
+
- `"Einstein" βDEVELOPEDβ "General Relativity" βPREDICTSβ "Gravitational Waves" βCONFIRMED_BYβ "LIGO"`
|
| 209 |
+
- `"SchrΓΆdinger" βPROPOSEDβ "Wave Equation" βDESCRIBESβ "Quantum Mechanics" βUNDERPINSβ "Semiconductors"`
|
| 210 |
|
| 211 |
+
Multi-hop questions like "Which physicist's work led to modern GPS corrections?" require traversing Scientist β Theory β Application edges. That's exactly what GraphRAG excels at vs Basic RAG.
|
| 212 |
|
| 213 |
---
|
| 214 |
|
|
|
|
| 233 |
|
| 234 |
---
|
| 235 |
|
| 236 |
+
## β‘ Latency Architecture
|
| 237 |
+
|
| 238 |
+
All three pipelines run concurrently β the compare API uses two parallel phases:
|
| 239 |
+
|
| 240 |
+
```
|
| 241 |
+
Request arrives
|
| 242 |
+
β
|
| 243 |
+
ββ Phase 1 (parallel): βββββββββββββββββββββββββββββββ
|
| 244 |
+
β βββ Pipeline 1: LLM-Only call (no retrieval) β ~1.2s
|
| 245 |
+
β βββ getEmbedding() β HuggingFace API β ~0.3s (cached after 1st call)
|
| 246 |
+
β β
|
| 247 |
+
β Phase 1 completes when BOTH finish: ~1.2s wall ββ
|
| 248 |
+
β
|
| 249 |
+
ββ TigerGraph vectorSearchChunks (sequential, needs embedding): ~0.3s
|
| 250 |
+
β
|
| 251 |
+
ββ Phase 2 (parallel): βββββββββββββββββββββββββββββββ
|
| 252 |
+
βββ Pipeline 2: Basic RAG LLM call β ~1.2s
|
| 253 |
+
βββ Pipeline 3: GraphRAG LLM call β ~1.0s
|
| 254 |
+
β
|
| 255 |
+
Phase 2 completes when BOTH finish: ~1.2s wall ββ
|
| 256 |
+
|
| 257 |
+
Total wall time: ~2.7s (vs ~3.9s sequential β 31% faster)
|
| 258 |
+
```
|
| 259 |
+
|
| 260 |
+
**Benchmark parallelization:** All 10 evaluation samples run via `Promise.allSettled` β benchmark completes in ~5s instead of ~40s sequential.
|
| 261 |
+
|
| 262 |
+
**Embedding cache:** Query embeddings are cached in-process (256-entry LRU). Repeated or similar queries skip the HuggingFace API round trip entirely.
|
| 263 |
+
|
| 264 |
+
**Client reuse:** OpenAI SDK client instances are cached per `(baseURL, apiKey)` pair β no re-instantiation or dynamic import overhead across the 3 concurrent LLM calls.
|
| 265 |
+
|
| 266 |
+
---
|
| 267 |
+
|
| 268 |
## π 14 Novel Techniques
|
| 269 |
|
| 270 |
### Graph Retrieval (6 papers, wired into Pipeline 3 via NoveltyEngine)
|
|
|
|
| 292 |
|---|---|---|---|
|
| 293 |
| **LLM-as-a-Judge** (PASS/FAIL) | β₯ 90% pass rate | **92%** | β
π BONUS |
|
| 294 |
| **BERTScore F1** (rescaled) | β₯ 0.55 | **0.58** | β
π BONUS |
|
| 295 |
+
| **F1 Score** | β | **0.7467** GraphRAG vs 0.5800 Basic RAG | **+28.7%** β
|
|
| 296 |
+
| **Token Reduction** (GraphRAG vs Basic RAG) | Show % improvement | **β44%** (163 vs 290 tokens/query) | β
|
|
| 297 |
+
| **Cost per Query** | β | ~$0.000025 (GraphRAG) vs ~$0.000044 (Basic RAG) | **β43%** β
|
|
| 298 |
+
| **Latency** | β | ~2.7s total wall time (3 pipelines run concurrently) | β
|
|
| 299 |
|
| 300 |
---
|
| 301 |
|
| 302 |
## π Quick Start
|
| 303 |
|
| 304 |
```bash
|
| 305 |
+
git clone https://github.com/MUTHUKUMARAN-K-1/graphrag-inference-hackathon
|
| 306 |
+
cd graphrag-inference-hackathon
|
|
|
|
|
|
|
|
|
|
|
|
|
| 307 |
|
| 308 |
+
# 1. Configure environment
|
| 309 |
+
cp web/.env.example web/.env
|
| 310 |
+
# Edit web/.env β add OPENAI_API_KEY (or botlearn.ai key), TG_HOST, TG_TOKEN, HF_TOKEN
|
| 311 |
|
| 312 |
+
# 2. Launch the Next.js dashboard
|
|
|
|
|
|
|
|
|
|
| 313 |
cd web && npm install && npm run dev
|
| 314 |
+
# β http://localhost:3000/playground (3-pipeline side-by-side comparison)
|
| 315 |
+
# β http://localhost:3000/benchmarks (batch eval: 10 questions, F1 + token metrics)
|
| 316 |
+
# β http://localhost:3000/explorer (graph entity explorer)
|
| 317 |
+
|
| 318 |
+
# 3. (Optional) Ingest your own corpus into TigerGraph
|
| 319 |
+
cd .. && pip install -r requirements.txt
|
| 320 |
+
python graphrag/prepare_dataset.py # downloads Wikipedia science corpus
|
| 321 |
+
python graphrag/ingestion.py # chunks + embeds + loads into TigerGraph
|
| 322 |
+
python graphrag/setup_tigergraph.py # installs GSQL queries (PPR, spreading activation, etc.)
|
| 323 |
```
|
| 324 |
|
| 325 |
---
|
|
|
|
| 350 |
tg_graphrag_client.py # Official TG GraphRAG service integration
|
| 351 |
orchestration_layer.py # 3-pipeline + NoveltyEngine wiring
|
| 352 |
evaluation_layer.py # LLM-Judge + BERTScore + RAGAS + F1/EM
|
| 353 |
+
novelties.py # 6 novel techniques (PPR, spreading activation, etc.)
|
| 354 |
+
graph_layer.py # TigerGraph GSQL query execution
|
| 355 |
+
gsql_advanced.py # Advanced GSQL: PPR, flow-pruned paths, activation
|
| 356 |
+
llm_layer.py # Provider dispatch
|
| 357 |
+
universal_llm.py # 12-provider unified LLM interface
|
| 358 |
graphrag/
|
| 359 |
+
ingestion.py / prepare_dataset.py / setup_tigergraph.py / main.py
|
| 360 |
+
web/src/
|
| 361 |
+
app/api/compare/route.ts # 3-pipeline compare API (parallel execution)
|
| 362 |
+
app/api/benchmark/route.ts # Batch benchmark API (10 samples, parallel)
|
| 363 |
+
app/api/providers/route.ts # Provider listing
|
| 364 |
+
lib/llm-providers.ts # 12-provider OpenAI-compat layer + client cache
|
| 365 |
+
lib/retrieval.ts # HF embeddings + TigerGraph vector search + cache
|
| 366 |
+
components/benchmarks/ # Benchmark UI with F1/token charts
|
| 367 |
+
components/playground/ # 3-column side-by-side playground
|
| 368 |
openclaw/ # Agent skills
|
| 369 |
tests/ # 55 tests
|
| 370 |
+
dataset/corpus.jsonl # 478 Wikipedia science articles (via git-lfs)
|
| 371 |
```
|
| 372 |
|
| 373 |
---
|