graph-rag / ARCHITECTURE.md
GitHub Action
Automated sync to Hugging Face
b602961

Graph RAG Service - Project Documentation

System Architecture

Overview

The Graph RAG Service is built as a modular, production-grade platform with the following key components:

  1. API Gateway (FastAPI): Handles all HTTP requests, authentication, and routing
  2. Ingestion Pipeline: Processes documents and constructs knowledge graphs
  3. Retrieval Agent (LangGraph): Intelligent query routing and response synthesis
  4. Storage Layer: Neo4j for graph + vector storage
  5. Task Queue: Celery + Redis for async processing
  6. Observability: OpenTelemetry for tracing and metrics

Design Principles

1. No Vendor Lock-in

All core components are abstracted behind interfaces:

  • GraphStore: Can swap Neo4j for AWS Neptune
  • VectorStore: Supports multiple vector databases
  • LLMProvider: Works with any LLM (OpenAI, Anthropic, Gemini, Ollama)

2. Production-Ready

  • Async Processing: Non-blocking I/O for all database operations
  • Background Jobs: Celery workers handle heavy ingestion tasks
  • Authentication: JWT-based with RBAC support
  • Error Handling: Graceful degradation and fallback mechanisms
  • Observability: Full tracing and metrics collection

3. Intelligent Retrieval

The agentic system:

  • Decomposes complex queries into sub-queries
  • Dynamically selects retrieval methods (vector vs graph vs cypher)
  • Validates outputs against schema (hallucination guard)
  • Provides reasoning chains for transparency

Components Deep Dive

Core Abstractions (src/graph_rag_service/core/)

GraphStore Interface

class GraphStore(ABC):
    @abstractmethod
    async def create_node(entity: Entity) -> str
    @abstractmethod
    async def create_relationship(relationship: Relationship) -> str
    @abstractmethod
    async def execute_query(query: str, params: dict) -> List[dict]
    @abstractmethod
    async def find_path(source: str, target: str, max_depth: int) -> List[dict]

Implementation: Neo4jStore provides unified graph + vector storage using Neo4j 5.x vector capabilities.

LLMProvider Interface

class LLMProvider(ABC):
    @abstractmethod
    async def complete(prompt: str, **kwargs) -> str
    @abstractmethod
    async def embed(text: str) -> List[float]

Implementation: UnifiedLLMProvider wraps OpenAI, Anthropic, Gemini, and Ollama with a consistent interface.

Entity Resolution

Multi-stage resolution:

  1. Blocking: Group by entity type and name similarity (fast reject)
  2. Semantic Check: Compare embeddings for deep similarity
  3. Threshold Matching: Configurable thresholds (0.85 default)
  4. Auto-merge: High confidence merges (>0.95)
  5. Human Review Queue: Medium confidence flagged for review (0.85-0.95)

Ingestion Pipeline (src/graph_rag_service/ingestion/)

Flow

  1. Document Processing: Extract text from PDF/TXT/MD/DOCX
  2. Chunking: Split into overlapping chunks (1024 tokens, 200 overlap)
  3. Ontology Generation: LLM analyzes samples to propose entity/relationship types
  4. Entity Extraction: Extract entities and relationships per chunk
  5. Entity Resolution: Deduplicate and merge entities
  6. Embedding Generation: Create vector embeddings (BGE-M3)
  7. Graph Construction: Store in Neo4j with hybrid nodes

Hybrid Nodes

Each chunk is stored as both:

  • A (:Chunk) node with text and embedding
  • Connected to (:Entity) nodes via [:MENTIONS] relationships

This preserves source text for grounding while enabling abstract graph queries.

Retrieval System (src/graph_rag_service/retrieval/)

Tools

  1. VectorSearchTool: Semantic similarity using embeddings
  2. GraphTraversalTool: Relationship exploration and path finding
  3. CypherGenerationTool: Text-to-Cypher with validation
  4. MetadataFilterTool: Structured queries on attributes

Agent Workflow (LangGraph)

[Query] β†’ [Decompose] β†’ [Route] β†’ [Vector/Graph/Cypher] β†’ [Synthesize] β†’ [Response]
                           ↑                                      ↓
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    (Iterative refinement)

Hallucination Guards

  • Schema Injection: Prompt includes allowed entity/relationship types
  • Cypher Validation: Parse and validate against whitelist
  • Self-Correction: Feed errors back to LLM to fix syntax
  • Fallback: If graph fails, degrade to vector search

API Layer (src/graph_rag_service/api/)

Endpoints

  • POST /api/auth/login: Get JWT token
  • POST /api/documents/upload: Upload document (returns task ID)
  • GET /api/documents/status/{task_id}: Check ingestion progress
  • POST /api/query: Execute agentic query
  • GET /api/ontology: Get current ontology schema
  • PUT /api/ontology: Update ontology (admin only)
  • GET /api/graph/visualization: Get graph data for visualization
  • GET /api/system/health: System health check
  • GET /api/system/stats: System statistics

Authentication

  • JWT tokens with configurable expiration (default: 30 min)
  • RBAC with scopes: read, write, admin
  • Dependency injection for protected endpoints

Workers (src/graph_rag_service/workers/)

Celery Tasks

  • ingest_document: Process single document
  • ingest_documents_batch: Process multiple documents
  • health_check: Worker health verification

Configuration

  • Broker: Redis
  • Result Backend: Redis
  • Serializer: JSON
  • Task timeout: 1 hour (configurable)

Observability (src/graph_rag_service/observability/)

OpenTelemetry Integration

  • Traces: Agent reasoning steps, tool calls, database queries
  • Metrics:
    • documents_ingested: Counter
    • queries_executed: Counter
    • query_duration_seconds: Histogram
    • entities_extracted: Counter

Structured Logging

  • Log level: INFO (configurable)
  • Format: %(asctime)s - %(name)s - %(levelname)s - %(message)s
  • All async operations logged with context

Configuration

Environment Variables

Key settings in .env:

  • Neo4j: NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD
  • Redis: REDIS_HOST, REDIS_PORT
  • LLM Provider: DEFAULT_LLM_PROVIDER (openai/anthropic/gemini/ollama)
  • API Keys: OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY
  • Ollama: OLLAMA_BASE_URL, OLLAMA_MODEL, OLLAMA_EMBEDDING_MODEL
  • Security: SECRET_KEY, ACCESS_TOKEN_EXPIRE_MINUTES

Tuning Parameters

  • CHUNK_SIZE: 1024 (text chunk size)
  • CHUNK_OVERLAP: 200 (overlap between chunks)
  • MAX_AGENT_ITERATIONS: 5 (max reasoning steps)
  • AGENT_TIMEOUT_SECONDS: 30 (query timeout)
  • ENTITY_RESOLUTION_THRESHOLD: 0.85 (similarity threshold)
  • DEFAULT_TOP_K: 5 (retrieval results)
  • GRAPH_MAX_DEPTH: 3 (graph traversal depth)

Deployment

Local Development

# 1. Ensure Neo4j and Redis are running
# 2. Configure .env with connection details

# 3. Start API server
./start-server.sh  # or start-server.bat on Windows

# 4. Start workers
./start-worker.sh  # or start-worker.bat on Windows

Production Considerations

  1. Database: Use managed Neo4j (Aura) or self-hosted cluster
  2. Redis: Use managed Redis (AWS ElastiCache, Redis Cloud)
  3. Worker Scaling: Add more Celery workers based on ingestion load
  4. API Scaling: Run multiple API instances behind load balancer
  5. Monitoring: Integrate with Prometheus/Grafana for metrics
  6. Secrets: Use secret management (AWS Secrets Manager, HashiCorp Vault)

Extensibility

Adding New LLM Provider

  1. Implement LLMProvider interface
  2. Add to LLMFactory.create() method
  3. Update config with new provider settings

Adding New Graph Database

  1. Implement GraphStore interface
  2. Update IngestionPipeline to use new store
  3. Test with existing workflows

Custom Retrieval Tools

  1. Create new tool class with run() method
  2. Add to AgentRetrievalSystem.tools
  3. Update routing logic in _route_query()

Testing Strategy

Unit Tests

  • Test each component independently
  • Mock external dependencies (Neo4j, Redis, LLMs)
  • Focus on business logic

Integration Tests

  • Test component interactions
  • Use test database instances
  • Verify end-to-end flows

Performance Tests

  • Benchmark ingestion throughput
  • Measure query latencies
  • Stress test with concurrent requests

Future Enhancements

Phase 1 (Current MVP)

  • βœ… Core ingestion pipeline
  • βœ… Agentic retrieval system
  • βœ… Multi-LLM support
  • βœ… Entity resolution
  • βœ… Async workers

Phase 2 (Next Steps)

  • React frontend with visual ontology editor
  • Graph visualization (D3.js/Cytoscape)
  • Advanced ontology evolution with migrations
  • Semantic cache with Redis
  • Batch ingestion optimization

Phase 3 (Advanced Features)

  • Multi-tenant support with data isolation
  • Fine-tuned entity extraction models
  • Graph neural network embeddings
  • Automated ontology quality metrics
  • Export/import ontology schemas

Troubleshooting

Common Issues

Neo4j Connection Failed

  • Verify Neo4j is running and accessible
  • Verify credentials in .env
  • Try connecting with cypher-shell: cypher-shell -u neo4j -p password

Celery Worker Not Processing

  • Check Redis is running: redis-cli ping
  • Verify broker URL in .env
  • Check worker logs

Ollama Models Not Found

  • Pull models: ollama pull llama3.2 && ollama pull bge-m3
  • Verify Ollama is running: curl http://localhost:11434/api/tags

Query Returns No Results

  • Verify documents are ingested: GET /api/system/stats
  • Check ontology exists: GET /api/ontology
  • Try simpler queries first

Support

For issues or questions:

  1. Check documentation and troubleshooting guide
  2. Search existing GitHub issues
  3. Open new issue with:
    • Clear description
    • Steps to reproduce
    • Environment details
    • Relevant logs

Last Updated: February 2026 Version: 0.1.0