Spaces:

anky2002
/

graph-rag

Configuration error

App Files Files Community

graph-rag / ARCHITECTURE.md

GitHub Action

Automated sync to Hugging Face

c11a2f8 about 2 hours ago

preview code

raw

history blame contribute delete

10.1 kB

	# Graph RAG Service - Project Documentation

	## System Architecture

	### Overview
	The Graph RAG Service is built as a modular, production-grade platform with the following key components:

	1. API Gateway (FastAPI): Handles all HTTP requests, authentication, and routing
	2. Ingestion Pipeline: Processes documents and constructs knowledge graphs
	3. Retrieval Agent (LangGraph): Intelligent query routing and response synthesis
	4. Storage Layer: Neo4j for graph + vector storage
	5. Task Queue: Celery + Redis for async processing
	6. Observability: OpenTelemetry for tracing and metrics

	### Design Principles

	#### 1. No Vendor Lock-in
	All core components are abstracted behind interfaces:
	- `GraphStore`: Can swap Neo4j for AWS Neptune
	- `VectorStore`: Supports multiple vector databases
	- `LLMProvider`: Works with any LLM (OpenAI, Anthropic, Gemini, Ollama)

	#### 2. Production-Ready
	- Async Processing: Non-blocking I/O for all database operations
	- Background Jobs: Celery workers handle heavy ingestion tasks
	- Authentication: JWT-based with RBAC support
	- Error Handling: Graceful degradation and fallback mechanisms
	- Observability: Full tracing and metrics collection

	#### 3. Intelligent Retrieval
	The agentic system:
	- Decomposes complex queries into sub-queries
	- Dynamically selects retrieval methods (vector vs graph vs cypher)
	- Validates outputs against schema (hallucination guard)
	- Provides reasoning chains for transparency

	## Components Deep Dive

	### Core Abstractions (`src/graph_rag_service/core/`)

	#### GraphStore Interface
	```python
	class GraphStore(ABC):
	@abstractmethod
	async def create_node(entity: Entity) -> str
	@abstractmethod
	async def create_relationship(relationship: Relationship) -> str
	@abstractmethod
	async def execute_query(query: str, params: dict) -> List[dict]
	@abstractmethod
	async def find_path(source: str, target: str, max_depth: int) -> List[dict]
	```

	Implementation: `Neo4jStore` provides unified graph + vector storage using Neo4j 5.x vector capabilities.

	#### LLMProvider Interface
	```python
	class LLMProvider(ABC):
	@abstractmethod
	async def complete(prompt: str, **kwargs) -> str
	@abstractmethod
	async def embed(text: str) -> List[float]
	```

	Implementation: `UnifiedLLMProvider` wraps OpenAI, Anthropic, Gemini, and Ollama with a consistent interface.

	#### Entity Resolution
	Multi-stage resolution:
	1. Blocking: Group by entity type and name similarity (fast reject)
	2. Semantic Check: Compare embeddings for deep similarity
	3. Threshold Matching: Configurable thresholds (0.85 default)
	4. Auto-merge: High confidence merges (>0.95)
	5. Human Review Queue: Medium confidence flagged for review (0.85-0.95)

	### Ingestion Pipeline (`src/graph_rag_service/ingestion/`)

	#### Flow
	1. Document Processing: Extract text from PDF/TXT/MD/DOCX
	2. Chunking: Split into overlapping chunks (1024 tokens, 200 overlap)
	3. Ontology Generation: LLM analyzes samples to propose entity/relationship types
	4. Entity Extraction: Extract entities and relationships per chunk
	5. Entity Resolution: Deduplicate and merge entities
	6. Embedding Generation: Create vector embeddings (BGE-M3)
	7. Graph Construction: Store in Neo4j with hybrid nodes

	#### Hybrid Nodes
	Each chunk is stored as both:
	- A `(:Chunk)` node with text and embedding
	- Connected to `(:Entity)` nodes via `[:MENTIONS]` relationships

	This preserves source text for grounding while enabling abstract graph queries.

	### Retrieval System (`src/graph_rag_service/retrieval/`)

	#### Tools
	1. VectorSearchTool: Semantic similarity using embeddings
	2. GraphTraversalTool: Relationship exploration and path finding
	3. CypherGenerationTool: Text-to-Cypher with validation
	4. MetadataFilterTool: Structured queries on attributes

	#### Agent Workflow (LangGraph)
	```
	[Query] → [Decompose] → [Route] → [Vector/Graph/Cypher] → [Synthesize] → [Response]
	↑ ↓
	└─────────────────────────────────────┘
	(Iterative refinement)
	```

	#### Hallucination Guards
	- Schema Injection: Prompt includes allowed entity/relationship types
	- Cypher Validation: Parse and validate against whitelist
	- Self-Correction: Feed errors back to LLM to fix syntax
	- Fallback: If graph fails, degrade to vector search

	### API Layer (`src/graph_rag_service/api/`)

	#### Endpoints
	- `POST /api/auth/login`: Get JWT token
	- `POST /api/documents/upload`: Upload document (returns task ID)
	- `GET /api/documents/status/{task_id}`: Check ingestion progress
	- `POST /api/query`: Execute agentic query
	- `GET /api/ontology`: Get current ontology schema
	- `PUT /api/ontology`: Update ontology (admin only)
	- `GET /api/graph/visualization`: Get graph data for visualization
	- `GET /api/system/health`: System health check
	- `GET /api/system/stats`: System statistics

	#### Authentication
	- JWT tokens with configurable expiration (default: 30 min)
	- RBAC with scopes: `read`, `write`, `admin`
	- Dependency injection for protected endpoints

	### Workers (`src/graph_rag_service/workers/`)

	#### Celery Tasks
	- `ingest_document`: Process single document
	- `ingest_documents_batch`: Process multiple documents
	- `health_check`: Worker health verification

	#### Configuration
	- Broker: Redis
	- Result Backend: Redis
	- Serializer: JSON
	- Task timeout: 1 hour (configurable)

	### Observability (`src/graph_rag_service/observability/`)

	#### OpenTelemetry Integration
	- Traces: Agent reasoning steps, tool calls, database queries
	- Metrics:
	- `documents_ingested`: Counter
	- `queries_executed`: Counter
	- `query_duration_seconds`: Histogram
	- `entities_extracted`: Counter

	#### Structured Logging
	- Log level: INFO (configurable)
	- Format: `%(asctime)s - %(name)s - %(levelname)s - %(message)s`
	- All async operations logged with context

	## Configuration

	### Environment Variables
	Key settings in `.env`:
	- Neo4j: `NEO4J_URI`, `NEO4J_USER`, `NEO4J_PASSWORD`
	- Redis: `REDIS_HOST`, `REDIS_PORT`
	- LLM Provider: `DEFAULT_LLM_PROVIDER` (openai/anthropic/gemini/ollama)
	- API Keys: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`
	- Ollama: `OLLAMA_BASE_URL`, `OLLAMA_MODEL`, `OLLAMA_EMBEDDING_MODEL`
	- Security: `SECRET_KEY`, `ACCESS_TOKEN_EXPIRE_MINUTES`

	### Tuning Parameters
	- `CHUNK_SIZE`: 1024 (text chunk size)
	- `CHUNK_OVERLAP`: 200 (overlap between chunks)
	- `MAX_AGENT_ITERATIONS`: 5 (max reasoning steps)
	- `AGENT_TIMEOUT_SECONDS`: 30 (query timeout)
	- `ENTITY_RESOLUTION_THRESHOLD`: 0.85 (similarity threshold)
	- `DEFAULT_TOP_K`: 5 (retrieval results)
	- `GRAPH_MAX_DEPTH`: 3 (graph traversal depth)

	## Deployment

	### Local Development
	```bash
	# 1. Ensure Neo4j and Redis are running
	# 2. Configure .env with connection details

	# 3. Start API server
	./start-server.sh # or start-server.bat on Windows

	# 4. Start workers
	./start-worker.sh # or start-worker.bat on Windows
	```

	### Production Considerations
	1. Database: Use managed Neo4j (Aura) or self-hosted cluster
	2. Redis: Use managed Redis (AWS ElastiCache, Redis Cloud)
	3. Worker Scaling: Add more Celery workers based on ingestion load
	4. API Scaling: Run multiple API instances behind load balancer
	5. Monitoring: Integrate with Prometheus/Grafana for metrics
	6. Secrets: Use secret management (AWS Secrets Manager, HashiCorp Vault)

	## Extensibility

	### Adding New LLM Provider
	1. Implement `LLMProvider` interface
	2. Add to `LLMFactory.create()` method
	3. Update config with new provider settings

	### Adding New Graph Database
	1. Implement `GraphStore` interface
	2. Update `IngestionPipeline` to use new store
	3. Test with existing workflows

	### Custom Retrieval Tools
	1. Create new tool class with `run()` method
	2. Add to `AgentRetrievalSystem.tools`
	3. Update routing logic in `_route_query()`

	## Testing Strategy

	### Unit Tests
	- Test each component independently
	- Mock external dependencies (Neo4j, Redis, LLMs)
	- Focus on business logic

	### Integration Tests
	- Test component interactions
	- Use test database instances
	- Verify end-to-end flows

	### Performance Tests
	- Benchmark ingestion throughput
	- Measure query latencies
	- Stress test with concurrent requests

	## Future Enhancements

	### Phase 1 (Current MVP)
	- ✅ Core ingestion pipeline
	- ✅ Agentic retrieval system
	- ✅ Multi-LLM support
	- ✅ Entity resolution
	- ✅ Async workers

	### Phase 2 (Next Steps)
	- [ ] React frontend with visual ontology editor
	- [ ] Graph visualization (D3.js/Cytoscape)
	- [ ] Advanced ontology evolution with migrations
	- [ ] Semantic cache with Redis
	- [ ] Batch ingestion optimization

	### Phase 3 (Advanced Features)
	- [ ] Multi-tenant support with data isolation
	- [ ] Fine-tuned entity extraction models
	- [ ] Graph neural network embeddings
	- [ ] Automated ontology quality metrics
	- [ ] Export/import ontology schemas

	## Troubleshooting

	### Common Issues

	#### Neo4j Connection Failed
	- Verify Neo4j is running and accessible
	- Verify credentials in `.env`
	- Try connecting with cypher-shell: `cypher-shell -u neo4j -p password`

	#### Celery Worker Not Processing
	- Check Redis is running: `redis-cli ping`
	- Verify broker URL in `.env`
	- Check worker logs

	#### Ollama Models Not Found
	- Pull models: `ollama pull llama3.2 && ollama pull bge-m3`
	- Verify Ollama is running: `curl http://localhost:11434/api/tags`

	#### Query Returns No Results
	- Verify documents are ingested: `GET /api/system/stats`
	- Check ontology exists: `GET /api/ontology`
	- Try simpler queries first

	## Support

	For issues or questions:
	1. Check documentation and troubleshooting guide
	2. Search existing GitHub issues
	3. Open new issue with:
	- Clear description
	- Steps to reproduce
	- Environment details
	- Relevant logs

	---

	Last Updated: February 2026
	Version: 0.1.0