Massive README: 14 novelties from 10 papers, full architecture, paper citations, test counts"

36c721f verified 12 days ago

preview code

raw

history blame

13.7 kB

🔍 GraphRAG Inference Hackathon — Dual Pipeline System

Proving that graphs make LLM inference faster, cheaper, and smarter — backed by 10 research papers.

14 Novelties · Architecture · Quick Start · Benchmarks · Papers

🎯 What This Is

A dual-pipeline GraphRAG system with 14 novel techniques from cutting-edge 2024–2025 research, 12 LLM providers (including free Ollama local), OpenClaw agent integration, and a production Next.js dashboard — all built on TigerGraph.

Pipeline A (Baseline)	Pipeline B (GraphRAG)
Query → LLM → Answer	Query → PolyG Router → PPR Scoring → Spreading Activation → Path Pruning → Token Budget → LLM → Answer
Simple, expensive	Smart, graph-enhanced, cost-controlled

🌟 14 Novel Techniques

Graph Retrieval Innovations (from 6 papers)

#	Technique	Paper	Key Result	Implementation
1	PPR Confidence-Weighted Retrieval	CatRAG `2602.01965`	Best reasoning completeness on 4 benchmarks	`PPRConfidenceScorer` — Personalized PageRank from seed entities, scores = context confidence
2	Spreading Activation Context Scoring	SA-RAG `2512.15922`	+39% answer correctness on MuSiQue	`SpreadingActivation` — propagates activation through graph with decay, ranks by signal strength
3	Flow-Pruned Path Serialization	PathRAG `2502.14902`	62–65% win rate vs LightRAG	`PathPruner` — finds reasoning paths, prunes by flow threshold, serializes high-reliability first (exploits lost-in-the-middle bias)
4	Graph Token Budget Controller	TERAG `2509.18667`	97% token reduction at 80%+ accuracy	`TokenBudgetController` — caps context by token limit, prioritizes by score × relevance
5	PolyG Hybrid Retrieval Router	RAGRouter-Bench `2602.00296`	Adaptive > any fixed paradigm	`PolyGRouter` — 4-class query taxonomy (entity/relation/multi-hop/summarization) → optimal strategy
6	Incremental Graph Updates	TG-RAG `2510.13590`	O(new) vs O(all) recomputation	`IncrementalGraphUpdater` — merge by embedding similarity, scoped community re-detection

Architecture Innovations

#	Technique	Paper	Description
7	Schema-Bounded Entity Extraction	Youtu-GraphRAG `2508.19855`	9 entity types + 15 relation types — ~90% extraction cost reduction, +16% accuracy
8	Dual-Level Keyword Retrieval	LightRAG `2410.05779`	High-level (themes) + low-level (entities) keywords for dual-channel retrieval
9	Adaptive Query Complexity Router	Original	LLM scores query complexity 0.0–1.0 → routes simple to baseline, complex to GraphRAG
10	Graph Reasoning Path Explanation	Original	Natural language step-by-step traversal explanation (Entry → Traversal → Evidence → Conclusion)

System Innovations

#	Technique	Description
11	12-Provider Universal LLM	Single interface for OpenAI, Claude, Gemini, Mistral, Ollama, Groq, DeepSeek, etc.
12	OpenClaw Agent Skills	GraphRAG as autonomous agent capabilities (CIK model: SOUL + IDENTITY + MEMORY + Skills)
13	Live Dashboard Benchmarking	"Run Benchmark Now" button — judges can evaluate both pipelines in real-time
14	Advanced GSQL Queries	PPR, shortest paths, spreading activation, neighborhood extraction — all as installable TigerGraph queries

🏗️ Architecture (AI Factory — 4 Layers)

┌──────────────────────────────────────────────────────────────────────────┐
│  LAYER 4: EVALUATION                                                      │
│  RAGAS │ F1/EM │ Token Tracking │ Live Benchmark │ Next.js Dashboard      │
├──────────────────────────────────────────────────────────────────────────┤
│  LAYER 3: UNIVERSAL LLM (12 Providers)                                    │
│  OpenAI │ Claude │ Gemini │ Mistral │ Ollama │ Groq │ DeepSeek │ …       │
├──────────────────────────────────────────────────────────────────────────┤
│  LAYER 2: INFERENCE ORCHESTRATION + NOVELTY ENGINE                        │
│  ┌─ PolyG Router ─→ PPR Scoring ─→ Spreading Activation ─┐              │
│  │  Path Pruning ─→ Token Budget ─→ Structured Context     │              │
│  ├─ Pipeline A: Baseline (Query → Vector → LLM)           │              │
│  └─ Pipeline B: GraphRAG (Query → Graph → Novelties → LLM)│              │
├──────────────────────────────────────────────────────────────────────────┤
│  LAYER 1: GRAPH (TigerGraph)                                              │
│  GSQL: PPR │ Shortest Paths │ Spreading Activation │ Vector Search        │
│  Schema: Document → Chunk → Entity → Community                            │
│  Incremental Updates │ Schema-Bounded Extraction                          │
└──────────────────────────────────────────────────────────────────────────┘

How the Novelty Engine Works (Pipeline B)

Query: "Were Einstein and Newton of the same nationality?"

Step 1: PolyG Router → "multi_hop" (score=0.7) → use graph_traversal
Step 2: PPR from seeds [Einstein, Newton] → score all reachable entities
Step 3: Spreading Activation → expand to 2-hop neighborhood with decay
Step 4: Combined scoring (0.6×PPR + 0.4×Activation) per chunk
Step 5: Token Budget (2000 tokens) → select top chunks, prune 60%+ redundancy
Step 6: Path Serialization → "Einstein →BORN_IN→ Germany, Newton →BORN_IN→ England"
Step 7: LLM generates answer with ranked, pruned, path-structured context

🚀 Quick Start

# Option A: Next.js Dashboard
cd web && npm install && npm run dev    # → http://localhost:3000

# Option B: Docker
docker build -t graphrag . && docker run -p 3000:3000 graphrag

# Option C: Python CLI
pip install -r requirements.txt && python -m graphrag.main demo

# Option D: Ollama (100% free)
ollama pull llama3.2 && cd web && npm install && npm run dev

🤖 12 LLM Providers

Provider	Model	Cost	Speed
Ollama 🦙	llama3.2	$0	⚡ Local
HuggingFace	Llama 3.3 70B	$0	🔵 Medium
DeepSeek	DeepSeek V3	$0.00014/1K	⚡ Fast
OpenAI	GPT-4o-mini	$0.00015/1K	⚡ Fast
Groq	Llama 3.3 70B	$0.0006/1K	⚡⚡ Blazing
Gemini	2.0 Flash	$0.0001/1K	⚡ Fast
Mistral	Large	$0.002/1K	🔵 Medium
Anthropic	Claude Sonnet 4	$0.003/1K	🔵 Medium
OpenRouter	200+ models	Varies	Varies
Cohere	Command R+	$0.0025/1K	🔵 Medium
xAI	Grok 3	$0.003/1K	🔵 Medium
Together	Llama 3.1 70B	$0.0009/1K	⚡ Fast

📊 Benchmarks

Live Benchmark (from Dashboard)

Click "🏃 Run Benchmark Now" → evaluates both pipelines on HotpotQA with real F1/EM.

Expected Performance (HotpotQA)

Metric	Baseline	GraphRAG	Δ	Winner
F1 Score	~0.45–0.60	~0.55–0.70	+13–21%	✅ GraphRAG
Exact Match	~0.30–0.45	~0.35–0.50	+11%	✅ GraphRAG
Tokens/Query	~800–1000	~1200–1800*	—	✅ Baseline
F1 Win Rate	—	~55–70%	—	✅ GraphRAG

*With Token Budget Controller, GraphRAG context is capped at 2000 tokens — 40–60% reduction vs. unbounded.

By Question Type

Type	Baseline F1	GraphRAG F1	Δ	Why
Bridge (multi-hop)	~0.52	~0.63	+21%	Graph traversal connects cross-document facts
Comparison	~0.58	~0.61	+5%	Entity-pair paths provide structured comparison context

🦞 OpenClaw Agent Integration

Component	File	Purpose
SOUL.md	`openclaw/SOUL.md`	Agent identity, values, boundaries
IDENTITY.md	`openclaw/IDENTITY.md`	Provider config, schema, channels
MEMORY.md	`openclaw/MEMORY.md`	Learned performance knowledge
graph_query	`openclaw/skills/graph_query/`	NL → knowledge graph traversal
compare_pipelines	`openclaw/skills/compare_pipelines/`	Dual-pipeline comparison
cost_estimate	`openclaw/skills/cost_estimate/`	12-provider cost projection

🧪 Testing

python tests/test_core.py        # 31 tests — core functions
python tests/test_novelties.py   # 24 tests — all 6 novelty techniques
# Total: 55 tests covering PPR, activation, routing, paths, budgets, F1/EM

📁 Project Structure (75 files, 280KB)

├── web/                                # Next.js 15 Dashboard
│   ├── src/app/api/
│   │   ├── compare/route.ts            # Multi-provider dual-pipeline API
│   │   ├── benchmark/route.ts          # Live benchmark with F1/EM
│   │   └── providers/route.ts          # Provider health + listing
│   ├── src/components/tabs/
│   │   ├── LiveCompare.tsx             # Provider selector + comparison
│   │   ├── Benchmark.tsx               # Live "Run Now" + charts
│   │   ├── CostAnalysis.tsx            # 12-provider projections
│   │   └── GraphExplorer.tsx           # Interactive SVG graph
│   └── src/lib/
│       ├── llm-providers.ts            # 12-provider universal client
│       └── design-tokens.ts            # TigerGraph×Claude tokens
│
├── graphrag/layers/
│   ├── graph_layer.py                  # Layer 1: TigerGraph + GSQL
│   ├── orchestration_layer.py          # Layer 2: Dual pipeline + routing
│   ├── llm_layer.py                    # Layer 3: LLM interactions
│   ├── universal_llm.py               # Layer 3: 12-provider support
│   ├── evaluation_layer.py            # Layer 4: RAGAS + F1/EM
│   ├── novelties.py                   # 🌟 6 novel techniques (NEW)
│   └── gsql_advanced.py               # 🌟 Advanced GSQL queries (NEW)
│
├── openclaw/                           # OpenClaw Agent (CIK model)
├── tests/
│   ├── test_core.py                    # 31 core tests
│   └── test_novelties.py              # 24 novelty tests (NEW)
├── Dockerfile
└── README.md

📚 References

Directly Implemented (6 papers)

CatRAG — PPR + Dynamic Edge Weighting — arXiv:2602.01965 (Feb 2025)
PathRAG — Flow-Pruned Path Retrieval — arXiv:2502.14902 (Feb 2025)
TERAG — Token-Efficient Graph RAG — arXiv:2509.18667 (Sep 2024)
SA-RAG — Spreading Activation Retrieval — arXiv:2512.15922 (Dec 2024)
RAGRouter-Bench — Hybrid Routing — arXiv:2602.00296 (Feb 2025)
TG-RAG — Incremental Temporal Graph — arXiv:2510.13590 (Oct 2024)

Architecture Inspiration (4 papers)

GraphRAG — Microsoft's Community-Based RAG — arXiv:2404.16130
LightRAG — Dual-Level Retrieval (34K⭐) — arXiv:2410.05779
Youtu-GraphRAG — Schema-Bounded Extraction (Tencent) — arXiv:2508.19855
HippoRAG 2 — PPR + Passage Integration — arXiv:2502.14802

Datasets & Evaluation

HotpotQA — Multi-hop QA benchmark
RAGAS — RAG evaluation framework

🏆 Built for the GraphRAG Inference Hackathon by TigerGraph

14 Novel Techniques · 10 Research Papers · 12 LLM Providers · 55 Unit Tests · OpenClaw Agent · Docker

Proving that graphs make LLM inference faster, cheaper, and smarter.