Graph RAG Service - Project Documentation
System Architecture
Overview
The Graph RAG Service is built as a modular, production-grade platform with the following key components:
- API Gateway (FastAPI): Handles all HTTP requests, authentication, and routing
- Ingestion Pipeline: Processes documents and constructs knowledge graphs
- Retrieval Agent (LangGraph): Intelligent query routing and response synthesis
- Storage Layer: Neo4j for graph + vector storage
- Task Queue: Celery + Redis for async processing
- Observability: OpenTelemetry for tracing and metrics
Design Principles
1. No Vendor Lock-in
All core components are abstracted behind interfaces:
GraphStore: Can swap Neo4j for AWS NeptuneVectorStore: Supports multiple vector databasesLLMProvider: Works with any LLM (OpenAI, Anthropic, Gemini, Ollama)
2. Production-Ready
- Async Processing: Non-blocking I/O for all database operations
- Background Jobs: Celery workers handle heavy ingestion tasks
- Authentication: JWT-based with RBAC support
- Error Handling: Graceful degradation and fallback mechanisms
- Observability: Full tracing and metrics collection
3. Intelligent Retrieval
The agentic system:
- Decomposes complex queries into sub-queries
- Dynamically selects retrieval methods (vector vs graph vs cypher)
- Validates outputs against schema (hallucination guard)
- Provides reasoning chains for transparency
Components Deep Dive
Core Abstractions (src/graph_rag_service/core/)
GraphStore Interface
class GraphStore(ABC):
@abstractmethod
async def create_node(entity: Entity) -> str
@abstractmethod
async def create_relationship(relationship: Relationship) -> str
@abstractmethod
async def execute_query(query: str, params: dict) -> List[dict]
@abstractmethod
async def find_path(source: str, target: str, max_depth: int) -> List[dict]
Implementation: Neo4jStore provides unified graph + vector storage using Neo4j 5.x vector capabilities.
LLMProvider Interface
class LLMProvider(ABC):
@abstractmethod
async def complete(prompt: str, **kwargs) -> str
@abstractmethod
async def embed(text: str) -> List[float]
Implementation: UnifiedLLMProvider wraps OpenAI, Anthropic, Gemini, and Ollama with a consistent interface.
Entity Resolution
Multi-stage resolution:
- Blocking: Group by entity type and name similarity (fast reject)
- Semantic Check: Compare embeddings for deep similarity
- Threshold Matching: Configurable thresholds (0.85 default)
- Auto-merge: High confidence merges (>0.95)
- Human Review Queue: Medium confidence flagged for review (0.85-0.95)
Ingestion Pipeline (src/graph_rag_service/ingestion/)
Flow
- Document Processing: Extract text from PDF/TXT/MD/DOCX
- Chunking: Split into overlapping chunks (1024 tokens, 200 overlap)
- Ontology Generation: LLM analyzes samples to propose entity/relationship types
- Entity Extraction: Extract entities and relationships per chunk
- Entity Resolution: Deduplicate and merge entities
- Embedding Generation: Create vector embeddings (BGE-M3)
- Graph Construction: Store in Neo4j with hybrid nodes
Hybrid Nodes
Each chunk is stored as both:
- A
(:Chunk)node with text and embedding - Connected to
(:Entity)nodes via[:MENTIONS]relationships
This preserves source text for grounding while enabling abstract graph queries.
Retrieval System (src/graph_rag_service/retrieval/)
Tools
- VectorSearchTool: Semantic similarity using embeddings
- GraphTraversalTool: Relationship exploration and path finding
- CypherGenerationTool: Text-to-Cypher with validation
- MetadataFilterTool: Structured queries on attributes
Agent Workflow (LangGraph)
[Query] β [Decompose] β [Route] β [Vector/Graph/Cypher] β [Synthesize] β [Response]
β β
βββββββββββββββββββββββββββββββββββββββ
(Iterative refinement)
Hallucination Guards
- Schema Injection: Prompt includes allowed entity/relationship types
- Cypher Validation: Parse and validate against whitelist
- Self-Correction: Feed errors back to LLM to fix syntax
- Fallback: If graph fails, degrade to vector search
API Layer (src/graph_rag_service/api/)
Endpoints
POST /api/auth/login: Get JWT tokenPOST /api/documents/upload: Upload document (returns task ID)GET /api/documents/status/{task_id}: Check ingestion progressPOST /api/query: Execute agentic queryGET /api/ontology: Get current ontology schemaPUT /api/ontology: Update ontology (admin only)GET /api/graph/visualization: Get graph data for visualizationGET /api/system/health: System health checkGET /api/system/stats: System statistics
Authentication
- JWT tokens with configurable expiration (default: 30 min)
- RBAC with scopes:
read,write,admin - Dependency injection for protected endpoints
Workers (src/graph_rag_service/workers/)
Celery Tasks
ingest_document: Process single documentingest_documents_batch: Process multiple documentshealth_check: Worker health verification
Configuration
- Broker: Redis
- Result Backend: Redis
- Serializer: JSON
- Task timeout: 1 hour (configurable)
Observability (src/graph_rag_service/observability/)
OpenTelemetry Integration
- Traces: Agent reasoning steps, tool calls, database queries
- Metrics:
documents_ingested: Counterqueries_executed: Counterquery_duration_seconds: Histogramentities_extracted: Counter
Structured Logging
- Log level: INFO (configurable)
- Format:
%(asctime)s - %(name)s - %(levelname)s - %(message)s - All async operations logged with context
Configuration
Environment Variables
Key settings in .env:
- Neo4j:
NEO4J_URI,NEO4J_USER,NEO4J_PASSWORD - Redis:
REDIS_HOST,REDIS_PORT - LLM Provider:
DEFAULT_LLM_PROVIDER(openai/anthropic/gemini/ollama) - API Keys:
OPENAI_API_KEY,ANTHROPIC_API_KEY,GOOGLE_API_KEY - Ollama:
OLLAMA_BASE_URL,OLLAMA_MODEL,OLLAMA_EMBEDDING_MODEL - Security:
SECRET_KEY,ACCESS_TOKEN_EXPIRE_MINUTES
Tuning Parameters
CHUNK_SIZE: 1024 (text chunk size)CHUNK_OVERLAP: 200 (overlap between chunks)MAX_AGENT_ITERATIONS: 5 (max reasoning steps)AGENT_TIMEOUT_SECONDS: 30 (query timeout)ENTITY_RESOLUTION_THRESHOLD: 0.85 (similarity threshold)DEFAULT_TOP_K: 5 (retrieval results)GRAPH_MAX_DEPTH: 3 (graph traversal depth)
Deployment
Local Development
# 1. Ensure Neo4j and Redis are running
# 2. Configure .env with connection details
# 3. Start API server
./start-server.sh # or start-server.bat on Windows
# 4. Start workers
./start-worker.sh # or start-worker.bat on Windows
Production Considerations
- Database: Use managed Neo4j (Aura) or self-hosted cluster
- Redis: Use managed Redis (AWS ElastiCache, Redis Cloud)
- Worker Scaling: Add more Celery workers based on ingestion load
- API Scaling: Run multiple API instances behind load balancer
- Monitoring: Integrate with Prometheus/Grafana for metrics
- Secrets: Use secret management (AWS Secrets Manager, HashiCorp Vault)
Extensibility
Adding New LLM Provider
- Implement
LLMProviderinterface - Add to
LLMFactory.create()method - Update config with new provider settings
Adding New Graph Database
- Implement
GraphStoreinterface - Update
IngestionPipelineto use new store - Test with existing workflows
Custom Retrieval Tools
- Create new tool class with
run()method - Add to
AgentRetrievalSystem.tools - Update routing logic in
_route_query()
Testing Strategy
Unit Tests
- Test each component independently
- Mock external dependencies (Neo4j, Redis, LLMs)
- Focus on business logic
Integration Tests
- Test component interactions
- Use test database instances
- Verify end-to-end flows
Performance Tests
- Benchmark ingestion throughput
- Measure query latencies
- Stress test with concurrent requests
Future Enhancements
Phase 1 (Current MVP)
- β Core ingestion pipeline
- β Agentic retrieval system
- β Multi-LLM support
- β Entity resolution
- β Async workers
Phase 2 (Next Steps)
- React frontend with visual ontology editor
- Graph visualization (D3.js/Cytoscape)
- Advanced ontology evolution with migrations
- Semantic cache with Redis
- Batch ingestion optimization
Phase 3 (Advanced Features)
- Multi-tenant support with data isolation
- Fine-tuned entity extraction models
- Graph neural network embeddings
- Automated ontology quality metrics
- Export/import ontology schemas
Troubleshooting
Common Issues
Neo4j Connection Failed
- Verify Neo4j is running and accessible
- Verify credentials in
.env - Try connecting with cypher-shell:
cypher-shell -u neo4j -p password
Celery Worker Not Processing
- Check Redis is running:
redis-cli ping - Verify broker URL in
.env - Check worker logs
Ollama Models Not Found
- Pull models:
ollama pull llama3.2 && ollama pull bge-m3 - Verify Ollama is running:
curl http://localhost:11434/api/tags
Query Returns No Results
- Verify documents are ingested:
GET /api/system/stats - Check ontology exists:
GET /api/ontology - Try simpler queries first
Support
For issues or questions:
- Check documentation and troubleshooting guide
- Search existing GitHub issues
- Open new issue with:
- Clear description
- Steps to reproduce
- Environment details
- Relevant logs
Last Updated: February 2026 Version: 0.1.0