| # Graph RAG Service - Project Documentation |
|
|
| ## System Architecture |
|
|
| ### Overview |
| The Graph RAG Service is built as a modular, production-grade platform with the following key components: |
|
|
| 1. **API Gateway (FastAPI)**: Handles all HTTP requests, authentication, and routing |
| 2. **Ingestion Pipeline**: Processes documents and constructs knowledge graphs |
| 3. **Retrieval Agent (LangGraph)**: Intelligent query routing and response synthesis |
| 4. **Storage Layer**: Neo4j for graph + vector storage |
| 5. **Task Queue**: Celery + Redis for async processing |
| 6. **Observability**: OpenTelemetry for tracing and metrics |
|
|
| ### Design Principles |
|
|
| #### 1. No Vendor Lock-in |
| All core components are abstracted behind interfaces: |
| - `GraphStore`: Can swap Neo4j for AWS Neptune |
| - `VectorStore`: Supports multiple vector databases |
| - `LLMProvider`: Works with any LLM (OpenAI, Anthropic, Gemini, Ollama) |
|
|
| #### 2. Production-Ready |
| - **Async Processing**: Non-blocking I/O for all database operations |
| - **Background Jobs**: Celery workers handle heavy ingestion tasks |
| - **Authentication**: JWT-based with RBAC support |
| - **Error Handling**: Graceful degradation and fallback mechanisms |
| - **Observability**: Full tracing and metrics collection |
|
|
| #### 3. Intelligent Retrieval |
| The agentic system: |
| - Decomposes complex queries into sub-queries |
| - Dynamically selects retrieval methods (vector vs graph vs cypher) |
| - Validates outputs against schema (hallucination guard) |
| - Provides reasoning chains for transparency |
|
|
| ## Components Deep Dive |
|
|
| ### Core Abstractions (`src/graph_rag_service/core/`) |
|
|
| #### GraphStore Interface |
| ```python |
| class GraphStore(ABC): |
| @abstractmethod |
| async def create_node(entity: Entity) -> str |
| @abstractmethod |
| async def create_relationship(relationship: Relationship) -> str |
| @abstractmethod |
| async def execute_query(query: str, params: dict) -> List[dict] |
| @abstractmethod |
| async def find_path(source: str, target: str, max_depth: int) -> List[dict] |
| ``` |
|
|
| Implementation: `Neo4jStore` provides unified graph + vector storage using Neo4j 5.x vector capabilities. |
|
|
| #### LLMProvider Interface |
| ```python |
| class LLMProvider(ABC): |
| @abstractmethod |
| async def complete(prompt: str, **kwargs) -> str |
| @abstractmethod |
| async def embed(text: str) -> List[float] |
| ``` |
|
|
| Implementation: `UnifiedLLMProvider` wraps OpenAI, Anthropic, Gemini, and Ollama with a consistent interface. |
|
|
| #### Entity Resolution |
| Multi-stage resolution: |
| 1. **Blocking**: Group by entity type and name similarity (fast reject) |
| 2. **Semantic Check**: Compare embeddings for deep similarity |
| 3. **Threshold Matching**: Configurable thresholds (0.85 default) |
| 4. **Auto-merge**: High confidence merges (>0.95) |
| 5. **Human Review Queue**: Medium confidence flagged for review (0.85-0.95) |
|
|
| ### Ingestion Pipeline (`src/graph_rag_service/ingestion/`) |
|
|
| #### Flow |
| 1. **Document Processing**: Extract text from PDF/TXT/MD/DOCX |
| 2. **Chunking**: Split into overlapping chunks (1024 tokens, 200 overlap) |
| 3. **Ontology Generation**: LLM analyzes samples to propose entity/relationship types |
| 4. **Entity Extraction**: Extract entities and relationships per chunk |
| 5. **Entity Resolution**: Deduplicate and merge entities |
| 6. **Embedding Generation**: Create vector embeddings (BGE-M3) |
| 7. **Graph Construction**: Store in Neo4j with hybrid nodes |
|
|
| #### Hybrid Nodes |
| Each chunk is stored as both: |
| - A `(:Chunk)` node with text and embedding |
| - Connected to `(:Entity)` nodes via `[:MENTIONS]` relationships |
|
|
| This preserves source text for grounding while enabling abstract graph queries. |
|
|
| ### Retrieval System (`src/graph_rag_service/retrieval/`) |
|
|
| #### Tools |
| 1. **VectorSearchTool**: Semantic similarity using embeddings |
| 2. **GraphTraversalTool**: Relationship exploration and path finding |
| 3. **CypherGenerationTool**: Text-to-Cypher with validation |
| 4. **MetadataFilterTool**: Structured queries on attributes |
|
|
| #### Agent Workflow (LangGraph) |
| ``` |
| [Query] β [Decompose] β [Route] β [Vector/Graph/Cypher] β [Synthesize] β [Response] |
| β β |
| βββββββββββββββββββββββββββββββββββββββ |
| (Iterative refinement) |
| ``` |
|
|
| #### Hallucination Guards |
| - **Schema Injection**: Prompt includes allowed entity/relationship types |
| - **Cypher Validation**: Parse and validate against whitelist |
| - **Self-Correction**: Feed errors back to LLM to fix syntax |
| - **Fallback**: If graph fails, degrade to vector search |
|
|
| ### API Layer (`src/graph_rag_service/api/`) |
|
|
| #### Endpoints |
| - `POST /api/auth/login`: Get JWT token |
| - `POST /api/documents/upload`: Upload document (returns task ID) |
| - `GET /api/documents/status/{task_id}`: Check ingestion progress |
| - `POST /api/query`: Execute agentic query |
| - `GET /api/ontology`: Get current ontology schema |
| - `PUT /api/ontology`: Update ontology (admin only) |
| - `GET /api/graph/visualization`: Get graph data for visualization |
| - `GET /api/system/health`: System health check |
| - `GET /api/system/stats`: System statistics |
|
|
| #### Authentication |
| - JWT tokens with configurable expiration (default: 30 min) |
| - RBAC with scopes: `read`, `write`, `admin` |
| - Dependency injection for protected endpoints |
|
|
| ### Workers (`src/graph_rag_service/workers/`) |
|
|
| #### Celery Tasks |
| - `ingest_document`: Process single document |
| - `ingest_documents_batch`: Process multiple documents |
| - `health_check`: Worker health verification |
|
|
| #### Configuration |
| - Broker: Redis |
| - Result Backend: Redis |
| - Serializer: JSON |
| - Task timeout: 1 hour (configurable) |
|
|
| ### Observability (`src/graph_rag_service/observability/`) |
|
|
| #### OpenTelemetry Integration |
| - **Traces**: Agent reasoning steps, tool calls, database queries |
| - **Metrics**: |
| - `documents_ingested`: Counter |
| - `queries_executed`: Counter |
| - `query_duration_seconds`: Histogram |
| - `entities_extracted`: Counter |
|
|
| #### Structured Logging |
| - Log level: INFO (configurable) |
| - Format: `%(asctime)s - %(name)s - %(levelname)s - %(message)s` |
| - All async operations logged with context |
|
|
| ## Configuration |
|
|
| ### Environment Variables |
| Key settings in `.env`: |
| - **Neo4j**: `NEO4J_URI`, `NEO4J_USER`, `NEO4J_PASSWORD` |
| - **Redis**: `REDIS_HOST`, `REDIS_PORT` |
| - **LLM Provider**: `DEFAULT_LLM_PROVIDER` (openai/anthropic/gemini/ollama) |
| - **API Keys**: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY` |
| - **Ollama**: `OLLAMA_BASE_URL`, `OLLAMA_MODEL`, `OLLAMA_EMBEDDING_MODEL` |
| - **Security**: `SECRET_KEY`, `ACCESS_TOKEN_EXPIRE_MINUTES` |
|
|
| ### Tuning Parameters |
| - `CHUNK_SIZE`: 1024 (text chunk size) |
| - `CHUNK_OVERLAP`: 200 (overlap between chunks) |
| - `MAX_AGENT_ITERATIONS`: 5 (max reasoning steps) |
| - `AGENT_TIMEOUT_SECONDS`: 30 (query timeout) |
| - `ENTITY_RESOLUTION_THRESHOLD`: 0.85 (similarity threshold) |
| - `DEFAULT_TOP_K`: 5 (retrieval results) |
| - `GRAPH_MAX_DEPTH`: 3 (graph traversal depth) |
|
|
| ## Deployment |
|
|
| ### Local Development |
| ```bash |
| # 1. Ensure Neo4j and Redis are running |
| # 2. Configure .env with connection details |
| |
| # 3. Start API server |
| ./start-server.sh # or start-server.bat on Windows |
| |
| # 4. Start workers |
| ./start-worker.sh # or start-worker.bat on Windows |
| ``` |
|
|
| ### Production Considerations |
| 1. **Database**: Use managed Neo4j (Aura) or self-hosted cluster |
| 2. **Redis**: Use managed Redis (AWS ElastiCache, Redis Cloud) |
| 3. **Worker Scaling**: Add more Celery workers based on ingestion load |
| 4. **API Scaling**: Run multiple API instances behind load balancer |
| 5. **Monitoring**: Integrate with Prometheus/Grafana for metrics |
| 6. **Secrets**: Use secret management (AWS Secrets Manager, HashiCorp Vault) |
|
|
| ## Extensibility |
|
|
| ### Adding New LLM Provider |
| 1. Implement `LLMProvider` interface |
| 2. Add to `LLMFactory.create()` method |
| 3. Update config with new provider settings |
|
|
| ### Adding New Graph Database |
| 1. Implement `GraphStore` interface |
| 2. Update `IngestionPipeline` to use new store |
| 3. Test with existing workflows |
|
|
| ### Custom Retrieval Tools |
| 1. Create new tool class with `run()` method |
| 2. Add to `AgentRetrievalSystem.tools` |
| 3. Update routing logic in `_route_query()` |
|
|
| ## Testing Strategy |
|
|
| ### Unit Tests |
| - Test each component independently |
| - Mock external dependencies (Neo4j, Redis, LLMs) |
| - Focus on business logic |
|
|
| ### Integration Tests |
| - Test component interactions |
| - Use test database instances |
| - Verify end-to-end flows |
|
|
| ### Performance Tests |
| - Benchmark ingestion throughput |
| - Measure query latencies |
| - Stress test with concurrent requests |
|
|
| ## Future Enhancements |
|
|
| ### Phase 1 (Current MVP) |
| - β
Core ingestion pipeline |
| - β
Agentic retrieval system |
| - β
Multi-LLM support |
| - β
Entity resolution |
| - β
Async workers |
|
|
| ### Phase 2 (Next Steps) |
| - [ ] React frontend with visual ontology editor |
| - [ ] Graph visualization (D3.js/Cytoscape) |
| - [ ] Advanced ontology evolution with migrations |
| - [ ] Semantic cache with Redis |
| - [ ] Batch ingestion optimization |
|
|
| ### Phase 3 (Advanced Features) |
| - [ ] Multi-tenant support with data isolation |
| - [ ] Fine-tuned entity extraction models |
| - [ ] Graph neural network embeddings |
| - [ ] Automated ontology quality metrics |
| - [ ] Export/import ontology schemas |
|
|
| ## Troubleshooting |
|
|
| ### Common Issues |
|
|
| #### Neo4j Connection Failed |
| - Verify Neo4j is running and accessible |
| - Verify credentials in `.env` |
| - Try connecting with cypher-shell: `cypher-shell -u neo4j -p password` |
|
|
| #### Celery Worker Not Processing |
| - Check Redis is running: `redis-cli ping` |
| - Verify broker URL in `.env` |
| - Check worker logs |
|
|
| #### Ollama Models Not Found |
| - Pull models: `ollama pull llama3.2 && ollama pull bge-m3` |
| - Verify Ollama is running: `curl http://localhost:11434/api/tags` |
|
|
| #### Query Returns No Results |
| - Verify documents are ingested: `GET /api/system/stats` |
| - Check ontology exists: `GET /api/ontology` |
| - Try simpler queries first |
|
|
| ## Support |
|
|
| For issues or questions: |
| 1. Check documentation and troubleshooting guide |
| 2. Search existing GitHub issues |
| 3. Open new issue with: |
| - Clear description |
| - Steps to reproduce |
| - Environment details |
| - Relevant logs |
|
|
| --- |
|
|
| **Last Updated**: February 2026 |
| **Version**: 0.1.0 |
|
|