| # Solution Architecture: Agentic Graph RAG as a Service |
|
|
| This document outlines a detailed technical approach to building the Agentic Graph RAG platform. It focuses on modularity, scalability, and the specific requirements of the Lyzr Hackathon, with a strong emphasis on production-grade robustness. |
|
|
| ## 1. High-Level Architecture |
|
|
| The system is designed as a set of modular services centered around a shared Knowledge Graph and Vector Store, fortified with enterprise-grade security and observability layers. |
|
|
| ```mermaid |
| graph TD |
| User[User / Client] --> Auth[Auth & Access Control] |
| Auth --> API[Unified API Gateway] |
| |
| subgraph "Observability Layer (OpenTelemetry)" |
| Logs[Structured Logging] |
| Traces[Agent Traces] |
| Metrics[Performance Metrics] |
| end |
| |
| API -.-> Logs & Traces & Metrics |
| |
| subgraph "Ingestion Pipeline (Async Workers)" |
| API --> Queue[Task Queue (Redis/Celery)] |
| Queue --> Ingest[Ingestion Worker] |
| Ingest --> Chunking[Text Chunking] |
| Chunking --> OntologyGen[LLM Ontology Gen (Versioned)] |
| OntologyGen --> Extract[Entity & Relation Extraction] |
| Extract --> Resolution[Entity Resolution / Dedup] |
| Resolution --> GraphDB[(Neo4j / Neptune)] |
| Resolution --> VectorDB[(Vector Store)] |
| end |
| |
| subgraph "Retrieval Context" |
| API --> Agent[Agent Orchestrator] |
| Agent --> Decomp[Query Decomposer] |
| Decomp --> Router[Query Router / Planner] |
| |
| Router --> |Semantic Query| VectorSearch[Vector Search] |
| Router --> |Deep Relation| GraphSearch[Graph Traversal / Cypher] |
| Router --> |Structured| FilterSearch[Metadata Filter] |
| |
| VectorSearch & GraphSearch & FilterSearch --> Validator[Hallucination Guard / Schema Validator] |
| Validator --> Synthesizer[Response Synthesizer] |
| Synthesizer --> Agent |
| end |
| ``` |
|
|
| ## 2. Technology Stack Selection |
|
|
| * **Language:** Python 3.12 (Standard for AI/ML engineering). |
| * **API Framework:** FastAPI (Async support, auto-documentation). |
| * **Orchestration:** LlamaIndex (Preferred for Graph RAG). |
| * **LLM:** multi-LLM support like ollama, open ai, gemini,claude (use lang-graph) (Reasoning & Extraction) & `BAAI bge-m3` this model is available on ollama so we will use from ollama (Embeddings). |
| * **Graph Database:** Neo4j (Primary) . |
| * **Vector Store:** Neo4j Vector Index (for unified storage) or Qdrant/Chroma. |
| * **Task Queue:** Celery with Redis (for async ingestion). |
| * **Monitoring:** OpenTelemetry + Prometheus/Grafana. |
| * **Frontend:** React vite + tailwind css (for Visual Ontology Editor). |
|
|
| ## 3. Production-Grade Components |
|
|
| ### A. Document-to-Graph Pipeline (Ingestion) |
|
|
| This pipeline converts unstructured text into a structured Knowledge Graph, robust to schema changes and duplicates. |
|
|
| 1. **Ontology Generation & Evolution:** |
| * *Initial:* Ask LLM to identify high-level concepts (nodes) and interactions (edges) from first $N$ chunks. |
| * *Visual Editor:* Human approval step to refine the JSON schema. |
| * **Drift Handling:** Incorporate an "Ontology Versioning" system. Every node/edge is tagged with `ontology_version: v1.0`. New documents causing schema changes trigger a "Migration Proposal" for approval. |
|
|
| 2. **Extraction & Embedding:** |
| * **Prompt Engineering:** "Given text + Ontology v1.0, extract entities/relationships." |
| * **Hybrid Nodes:** Create `(:Chunk)` nodes linked to `(:Entity)` nodes (`(:Chunk)-[:MENTIONS]->(:Entity)`). This preserves ground truth source text alongside abstract graph relationships. |
|
|
| 3. **Advanced Entity Resolution:** |
| * *Naive:* Exact string match. |
| * *Production:* Multi-stage blocking and merging. |
| 1. **Blocking:** Group entities by Label and similar name (e.g., phonetic match). |
| 2. **Semantic Check:** Compare embeddings of candidates. |
| 3. **Threshold:** If similarity > 0.95 -> Auto-merge. If 0.85-0.95 -> Flag for "Human Review Queue". |
|
|
| ### B. The Agentic Retrieval System (The Brain) |
|
|
| A state machine loop designed for accuracy and fail-safe operation. |
|
|
| **1. Query Decomposition & Routing** |
| Instead of a single step, the Agent breaks down complexity: |
| * *User Query:* "How is the CEO of Lyzr related to OpenAI?" |
| * *Decomposition:* |
| 1. "Identify Lyzr CEO" (Vector/Graph lookup) -> *Result: user_X* |
| 2. "Find path between user_X and OpenAI" (Graph traversal). |
| * *Router:* Dynamically selects tools for each sub-step. |
| |
| **2. Tool Implementation with Guardrails:** |
| * **Vector Tool:** Top-k retrieval using embedding similarity. |
| * **Graph Tool (Text-to-Cypher):** Uses LLM to generate Cypher. |
| * **Hallucination Guard:** The tool injects the *strict* allowed schema into the prompt. Generated Cypher is parsed and validated against a "Relationship Whitelist" before execution to prevent schema injection or invalid edge types. |
| * **Filter Tool:** Converts natural language to structured DB filters (WHERE clauses). |
| |
| **3. Latency & Performance Strategy:** |
| * **Timeouts:** Hard limit on agent reasoning steps (e.g., max 5 loops). |
| * **Fallback:** If Graph tool fails or times out, degrade gracefully to pure Vector Search for a "best effort" answer. |
| |
| ### C. Parity & Extensibility Layer |
| |
| We define abstract base class interfaces to ensure no vendor lock-in. |
| |
| ```python |
| class GraphStore(ABC): |
| @abstractmethod |
| def execute_query(self, query: str, params: dict): pass |
|
|
| class VectorStore(ABC): |
| @abstractmethod |
| def search(self, query_vector: List[float], k: int): pass |
| |
| class LLMProvider(ABC): |
| @abstractmethod |
| def complete(self, prompt: str): pass |
| |
| # Implementations: Neo4jStore, NeptuneStore, QdrantStore, OpenAIProvider, etc. |
| ``` |
| |
| ## 4. Scalability, Security & Observability |
| |
| To meet "Production-Grade" criteria, these non-functional requirements are critical: |
| |
| 1. **Access Control (RBAC):** |
| * Pre-retrieval enforcement. |
| * All queries filter by `user.tenant_id` or `user.permissions` to ensure users only retrieve data they are authorized to see. |
| |
| 2. **Observability:** |
| * **Tracing:** Log every step of the Agent's reasoning chain (Input -> Decomp -> Tool Call -> Result). This is vital for debugging "why did the bot say that?". |
| * **Metrics:** Track Token Usage, Latency p95, and Cache Hit Rates. |
| |
| 3. **Async Ingestion:** |
| * Ingestion is decoupled from the user request loop. |
| * File Upload API -> Pushes ID to Redis Queue -> Background Worker picks up -> Runs Extraction -> Updates Graph. |
| |
| 4. **Caching Strategy:** |
| * **Semantic Cache (Redis):** Before hitting the LLM, check if a semantically similar query has been answered recently. reduces cost and latency. |
| * **Embedding Cache:** Store computed embeddings to avoid re-calculation for identical text chunks. |
| |
| ## 5. Implementation Plan |
| |
| ### Phase 1: Foundation (Hours 1-4) |
| 1. Set up Repository, Python envf (Neo4j/Redis). |
| 2. Implement `GraphStore` & `VectorStore` abstractions. |
| 3. Create Basic Auth & Middleware logging. |
| |
| ### Phase 2: Ingestion Engine (Hours 5-12) |
| 1. Implement PDF extractor & Async Worker skeleton. |
| 2. Build "Ontology Proposer" & "Graph Extractor" prompts. |
| 3. Implement Entity Resolution logic. |
| |
| ### Phase 3: The Retrieval Agent (Hours 13-20) |
| 1. Set up Agent loop with Query Decomposition. |
| 2. Implement `Text2Cypher` with schema validation. |
| 3. Implement Latency Timeouts & Fallbacks. |
| |
| ### Phase 4: Refinement & UI (Hours 21-24) |
| 1. Build Visual Editor (Streamlit). |
| 2. Add simple Evaluation Script (run known queries, check answers). |
| 3. Write `README.md` highlighting the "Production Thinking" (RBAC, Async, Observability). |
| |
| ## 6. Key Innovations |
| 1. **Hybrid Chunk Nodes:** Storing source text explicitly in the graph for ground-truth verification. |
| 2. **Self-Correcting Cypher:** If Cypher execution fails, feed the error back to the LLM to fix syntax automatically. |
| 3. **Adaptive Retrieval:** The agent assigns a "confidence score" to each retrieval method. If Vector Search confidence is low (<0.7), it automatically triggers Graph Traversal to boost context. |
| |