# Graph RAG Service - Project Documentation ## System Architecture ### Overview The Graph RAG Service is built as a modular, production-grade platform with the following key components: 1. **API Gateway (FastAPI)**: Handles all HTTP requests, authentication, and routing 2. **Ingestion Pipeline**: Processes documents and constructs knowledge graphs 3. **Retrieval Agent (LangGraph)**: Intelligent query routing and response synthesis 4. **Storage Layer**: Neo4j for graph + vector storage 5. **Task Queue**: Celery + Redis for async processing 6. **Observability**: OpenTelemetry for tracing and metrics ### Design Principles #### 1. No Vendor Lock-in All core components are abstracted behind interfaces: - `GraphStore`: Can swap Neo4j for AWS Neptune - `VectorStore`: Supports multiple vector databases - `LLMProvider`: Works with any LLM (OpenAI, Anthropic, Gemini, Ollama) #### 2. Production-Ready - **Async Processing**: Non-blocking I/O for all database operations - **Background Jobs**: Celery workers handle heavy ingestion tasks - **Authentication**: JWT-based with RBAC support - **Error Handling**: Graceful degradation and fallback mechanisms - **Observability**: Full tracing and metrics collection #### 3. Intelligent Retrieval The agentic system: - Decomposes complex queries into sub-queries - Dynamically selects retrieval methods (vector vs graph vs cypher) - Validates outputs against schema (hallucination guard) - Provides reasoning chains for transparency ## Components Deep Dive ### Core Abstractions (`src/graph_rag_service/core/`) #### GraphStore Interface ```python class GraphStore(ABC): @abstractmethod async def create_node(entity: Entity) -> str @abstractmethod async def create_relationship(relationship: Relationship) -> str @abstractmethod async def execute_query(query: str, params: dict) -> List[dict] @abstractmethod async def find_path(source: str, target: str, max_depth: int) -> List[dict] ``` Implementation: `Neo4jStore` provides unified graph + vector storage using Neo4j 5.x vector capabilities. #### LLMProvider Interface ```python class LLMProvider(ABC): @abstractmethod async def complete(prompt: str, **kwargs) -> str @abstractmethod async def embed(text: str) -> List[float] ``` Implementation: `UnifiedLLMProvider` wraps OpenAI, Anthropic, Gemini, and Ollama with a consistent interface. #### Entity Resolution Multi-stage resolution: 1. **Blocking**: Group by entity type and name similarity (fast reject) 2. **Semantic Check**: Compare embeddings for deep similarity 3. **Threshold Matching**: Configurable thresholds (0.85 default) 4. **Auto-merge**: High confidence merges (>0.95) 5. **Human Review Queue**: Medium confidence flagged for review (0.85-0.95) ### Ingestion Pipeline (`src/graph_rag_service/ingestion/`) #### Flow 1. **Document Processing**: Extract text from PDF/TXT/MD/DOCX 2. **Chunking**: Split into overlapping chunks (1024 tokens, 200 overlap) 3. **Ontology Generation**: LLM analyzes samples to propose entity/relationship types 4. **Entity Extraction**: Extract entities and relationships per chunk 5. **Entity Resolution**: Deduplicate and merge entities 6. **Embedding Generation**: Create vector embeddings (BGE-M3) 7. **Graph Construction**: Store in Neo4j with hybrid nodes #### Hybrid Nodes Each chunk is stored as both: - A `(:Chunk)` node with text and embedding - Connected to `(:Entity)` nodes via `[:MENTIONS]` relationships This preserves source text for grounding while enabling abstract graph queries. ### Retrieval System (`src/graph_rag_service/retrieval/`) #### Tools 1. **VectorSearchTool**: Semantic similarity using embeddings 2. **GraphTraversalTool**: Relationship exploration and path finding 3. **CypherGenerationTool**: Text-to-Cypher with validation 4. **MetadataFilterTool**: Structured queries on attributes #### Agent Workflow (LangGraph) ``` [Query] → [Decompose] → [Route] → [Vector/Graph/Cypher] → [Synthesize] → [Response] ↑ ↓ └─────────────────────────────────────┘ (Iterative refinement) ``` #### Hallucination Guards - **Schema Injection**: Prompt includes allowed entity/relationship types - **Cypher Validation**: Parse and validate against whitelist - **Self-Correction**: Feed errors back to LLM to fix syntax - **Fallback**: If graph fails, degrade to vector search ### API Layer (`src/graph_rag_service/api/`) #### Endpoints - `POST /api/auth/login`: Get JWT token - `POST /api/documents/upload`: Upload document (returns task ID) - `GET /api/documents/status/{task_id}`: Check ingestion progress - `POST /api/query`: Execute agentic query - `GET /api/ontology`: Get current ontology schema - `PUT /api/ontology`: Update ontology (admin only) - `GET /api/graph/visualization`: Get graph data for visualization - `GET /api/system/health`: System health check - `GET /api/system/stats`: System statistics #### Authentication - JWT tokens with configurable expiration (default: 30 min) - RBAC with scopes: `read`, `write`, `admin` - Dependency injection for protected endpoints ### Workers (`src/graph_rag_service/workers/`) #### Celery Tasks - `ingest_document`: Process single document - `ingest_documents_batch`: Process multiple documents - `health_check`: Worker health verification #### Configuration - Broker: Redis - Result Backend: Redis - Serializer: JSON - Task timeout: 1 hour (configurable) ### Observability (`src/graph_rag_service/observability/`) #### OpenTelemetry Integration - **Traces**: Agent reasoning steps, tool calls, database queries - **Metrics**: - `documents_ingested`: Counter - `queries_executed`: Counter - `query_duration_seconds`: Histogram - `entities_extracted`: Counter #### Structured Logging - Log level: INFO (configurable) - Format: `%(asctime)s - %(name)s - %(levelname)s - %(message)s` - All async operations logged with context ## Configuration ### Environment Variables Key settings in `.env`: - **Neo4j**: `NEO4J_URI`, `NEO4J_USER`, `NEO4J_PASSWORD` - **Redis**: `REDIS_HOST`, `REDIS_PORT` - **LLM Provider**: `DEFAULT_LLM_PROVIDER` (openai/anthropic/gemini/ollama) - **API Keys**: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY` - **Ollama**: `OLLAMA_BASE_URL`, `OLLAMA_MODEL`, `OLLAMA_EMBEDDING_MODEL` - **Security**: `SECRET_KEY`, `ACCESS_TOKEN_EXPIRE_MINUTES` ### Tuning Parameters - `CHUNK_SIZE`: 1024 (text chunk size) - `CHUNK_OVERLAP`: 200 (overlap between chunks) - `MAX_AGENT_ITERATIONS`: 5 (max reasoning steps) - `AGENT_TIMEOUT_SECONDS`: 30 (query timeout) - `ENTITY_RESOLUTION_THRESHOLD`: 0.85 (similarity threshold) - `DEFAULT_TOP_K`: 5 (retrieval results) - `GRAPH_MAX_DEPTH`: 3 (graph traversal depth) ## Deployment ### Local Development ```bash # 1. Ensure Neo4j and Redis are running # 2. Configure .env with connection details # 3. Start API server ./start-server.sh # or start-server.bat on Windows # 4. Start workers ./start-worker.sh # or start-worker.bat on Windows ``` ### Production Considerations 1. **Database**: Use managed Neo4j (Aura) or self-hosted cluster 2. **Redis**: Use managed Redis (AWS ElastiCache, Redis Cloud) 3. **Worker Scaling**: Add more Celery workers based on ingestion load 4. **API Scaling**: Run multiple API instances behind load balancer 5. **Monitoring**: Integrate with Prometheus/Grafana for metrics 6. **Secrets**: Use secret management (AWS Secrets Manager, HashiCorp Vault) ## Extensibility ### Adding New LLM Provider 1. Implement `LLMProvider` interface 2. Add to `LLMFactory.create()` method 3. Update config with new provider settings ### Adding New Graph Database 1. Implement `GraphStore` interface 2. Update `IngestionPipeline` to use new store 3. Test with existing workflows ### Custom Retrieval Tools 1. Create new tool class with `run()` method 2. Add to `AgentRetrievalSystem.tools` 3. Update routing logic in `_route_query()` ## Testing Strategy ### Unit Tests - Test each component independently - Mock external dependencies (Neo4j, Redis, LLMs) - Focus on business logic ### Integration Tests - Test component interactions - Use test database instances - Verify end-to-end flows ### Performance Tests - Benchmark ingestion throughput - Measure query latencies - Stress test with concurrent requests ## Future Enhancements ### Phase 1 (Current MVP) - ✅ Core ingestion pipeline - ✅ Agentic retrieval system - ✅ Multi-LLM support - ✅ Entity resolution - ✅ Async workers ### Phase 2 (Next Steps) - [ ] React frontend with visual ontology editor - [ ] Graph visualization (D3.js/Cytoscape) - [ ] Advanced ontology evolution with migrations - [ ] Semantic cache with Redis - [ ] Batch ingestion optimization ### Phase 3 (Advanced Features) - [ ] Multi-tenant support with data isolation - [ ] Fine-tuned entity extraction models - [ ] Graph neural network embeddings - [ ] Automated ontology quality metrics - [ ] Export/import ontology schemas ## Troubleshooting ### Common Issues #### Neo4j Connection Failed - Verify Neo4j is running and accessible - Verify credentials in `.env` - Try connecting with cypher-shell: `cypher-shell -u neo4j -p password` #### Celery Worker Not Processing - Check Redis is running: `redis-cli ping` - Verify broker URL in `.env` - Check worker logs #### Ollama Models Not Found - Pull models: `ollama pull llama3.2 && ollama pull bge-m3` - Verify Ollama is running: `curl http://localhost:11434/api/tags` #### Query Returns No Results - Verify documents are ingested: `GET /api/system/stats` - Check ontology exists: `GET /api/ontology` - Try simpler queries first ## Support For issues or questions: 1. Check documentation and troubleshooting guide 2. Search existing GitHub issues 3. Open new issue with: - Clear description - Steps to reproduce - Environment details - Relevant logs --- **Last Updated**: February 2026 **Version**: 0.1.0