Solution Architecture: Agentic Graph RAG as a Service
This document outlines a detailed technical approach to building the Agentic Graph RAG platform. It focuses on modularity, scalability, and the specific requirements of the Lyzr Hackathon, with a strong emphasis on production-grade robustness.
1. High-Level Architecture
The system is designed as a set of modular services centered around a shared Knowledge Graph and Vector Store, fortified with enterprise-grade security and observability layers.
graph TD
User[User / Client] --> Auth[Auth & Access Control]
Auth --> API[Unified API Gateway]
subgraph "Observability Layer (OpenTelemetry)"
Logs[Structured Logging]
Traces[Agent Traces]
Metrics[Performance Metrics]
end
API -.-> Logs & Traces & Metrics
subgraph "Ingestion Pipeline (Async Workers)"
API --> Queue[Task Queue (Redis/Celery)]
Queue --> Ingest[Ingestion Worker]
Ingest --> Chunking[Text Chunking]
Chunking --> OntologyGen[LLM Ontology Gen (Versioned)]
OntologyGen --> Extract[Entity & Relation Extraction]
Extract --> Resolution[Entity Resolution / Dedup]
Resolution --> GraphDB[(Neo4j / Neptune)]
Resolution --> VectorDB[(Vector Store)]
end
subgraph "Retrieval Context"
API --> Agent[Agent Orchestrator]
Agent --> Decomp[Query Decomposer]
Decomp --> Router[Query Router / Planner]
Router --> |Semantic Query| VectorSearch[Vector Search]
Router --> |Deep Relation| GraphSearch[Graph Traversal / Cypher]
Router --> |Structured| FilterSearch[Metadata Filter]
VectorSearch & GraphSearch & FilterSearch --> Validator[Hallucination Guard / Schema Validator]
Validator --> Synthesizer[Response Synthesizer]
Synthesizer --> Agent
end
2. Technology Stack Selection
- Language: Python 3.12 (Standard for AI/ML engineering).
- API Framework: FastAPI (Async support, auto-documentation).
- Orchestration: LlamaIndex (Preferred for Graph RAG).
- LLM: multi-LLM support like ollama, open ai, gemini,claude (use lang-graph) (Reasoning & Extraction) &
BAAI bge-m3this model is available on ollama so we will use from ollama (Embeddings). - Graph Database: Neo4j (Primary) .
- Vector Store: Neo4j Vector Index (for unified storage) or Qdrant/Chroma.
- Task Queue: Celery with Redis (for async ingestion).
- Monitoring: OpenTelemetry + Prometheus/Grafana.
- Frontend: React vite + tailwind css (for Visual Ontology Editor).
3. Production-Grade Components
A. Document-to-Graph Pipeline (Ingestion)
This pipeline converts unstructured text into a structured Knowledge Graph, robust to schema changes and duplicates.
Ontology Generation & Evolution:
- Initial: Ask LLM to identify high-level concepts (nodes) and interactions (edges) from first $N$ chunks.
- Visual Editor: Human approval step to refine the JSON schema.
- Drift Handling: Incorporate an "Ontology Versioning" system. Every node/edge is tagged with
ontology_version: v1.0. New documents causing schema changes trigger a "Migration Proposal" for approval.
Extraction & Embedding:
- Prompt Engineering: "Given text + Ontology v1.0, extract entities/relationships."
- Hybrid Nodes: Create
(:Chunk)nodes linked to(:Entity)nodes ((:Chunk)-[:MENTIONS]->(:Entity)). This preserves ground truth source text alongside abstract graph relationships.
Advanced Entity Resolution:
- Naive: Exact string match.
- Production: Multi-stage blocking and merging.
- Blocking: Group entities by Label and similar name (e.g., phonetic match).
- Semantic Check: Compare embeddings of candidates.
- Threshold: If similarity > 0.95 -> Auto-merge. If 0.85-0.95 -> Flag for "Human Review Queue".
B. The Agentic Retrieval System (The Brain)
A state machine loop designed for accuracy and fail-safe operation.
1. Query Decomposition & Routing Instead of a single step, the Agent breaks down complexity:
- User Query: "How is the CEO of Lyzr related to OpenAI?"
- Decomposition:
- "Identify Lyzr CEO" (Vector/Graph lookup) -> Result: user_X
- "Find path between user_X and OpenAI" (Graph traversal).
- Router: Dynamically selects tools for each sub-step.
2. Tool Implementation with Guardrails:
- Vector Tool: Top-k retrieval using embedding similarity.
- Graph Tool (Text-to-Cypher): Uses LLM to generate Cypher.
- Hallucination Guard: The tool injects the strict allowed schema into the prompt. Generated Cypher is parsed and validated against a "Relationship Whitelist" before execution to prevent schema injection or invalid edge types.
- Filter Tool: Converts natural language to structured DB filters (WHERE clauses).
3. Latency & Performance Strategy:
- Timeouts: Hard limit on agent reasoning steps (e.g., max 5 loops).
- Fallback: If Graph tool fails or times out, degrade gracefully to pure Vector Search for a "best effort" answer.
C. Parity & Extensibility Layer
We define abstract base class interfaces to ensure no vendor lock-in.
class GraphStore(ABC):
@abstractmethod
def execute_query(self, query: str, params: dict): pass
class VectorStore(ABC):
@abstractmethod
def search(self, query_vector: List[float], k: int): pass
class LLMProvider(ABC):
@abstractmethod
def complete(self, prompt: str): pass
# Implementations: Neo4jStore, NeptuneStore, QdrantStore, OpenAIProvider, etc.
4. Scalability, Security & Observability
To meet "Production-Grade" criteria, these non-functional requirements are critical:
Access Control (RBAC):
- Pre-retrieval enforcement.
- All queries filter by
user.tenant_idoruser.permissionsto ensure users only retrieve data they are authorized to see.
Observability:
- Tracing: Log every step of the Agent's reasoning chain (Input -> Decomp -> Tool Call -> Result). This is vital for debugging "why did the bot say that?".
- Metrics: Track Token Usage, Latency p95, and Cache Hit Rates.
Async Ingestion:
- Ingestion is decoupled from the user request loop.
- File Upload API -> Pushes ID to Redis Queue -> Background Worker picks up -> Runs Extraction -> Updates Graph.
Caching Strategy:
- Semantic Cache (Redis): Before hitting the LLM, check if a semantically similar query has been answered recently. reduces cost and latency.
- Embedding Cache: Store computed embeddings to avoid re-calculation for identical text chunks.
5. Implementation Plan
Phase 1: Foundation (Hours 1-4)
- Set up Repository, Python envf (Neo4j/Redis).
- Implement
GraphStore&VectorStoreabstractions. - Create Basic Auth & Middleware logging.
Phase 2: Ingestion Engine (Hours 5-12)
- Implement PDF extractor & Async Worker skeleton.
- Build "Ontology Proposer" & "Graph Extractor" prompts.
- Implement Entity Resolution logic.
Phase 3: The Retrieval Agent (Hours 13-20)
- Set up Agent loop with Query Decomposition.
- Implement
Text2Cypherwith schema validation. - Implement Latency Timeouts & Fallbacks.
Phase 4: Refinement & UI (Hours 21-24)
- Build Visual Editor (Streamlit).
- Add simple Evaluation Script (run known queries, check answers).
- Write
README.mdhighlighting the "Production Thinking" (RBAC, Async, Observability).
6. Key Innovations
- Hybrid Chunk Nodes: Storing source text explicitly in the graph for ground-truth verification.
- Self-Correcting Cypher: If Cypher execution fails, feed the error back to the LLM to fix syntax automatically.
- Adaptive Retrieval: The agent assigns a "confidence score" to each retrieval method. If Vector Search confidence is low (<0.7), it automatically triggers Graph Traversal to boost context.