File size: 10,070 Bytes
c11a2f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
# Graph RAG Service - Project Documentation

## System Architecture

### Overview
The Graph RAG Service is built as a modular, production-grade platform with the following key components:

1. **API Gateway (FastAPI)**: Handles all HTTP requests, authentication, and routing
2. **Ingestion Pipeline**: Processes documents and constructs knowledge graphs
3. **Retrieval Agent (LangGraph)**: Intelligent query routing and response synthesis
4. **Storage Layer**: Neo4j for graph + vector storage
5. **Task Queue**: Celery + Redis for async processing
6. **Observability**: OpenTelemetry for tracing and metrics

### Design Principles

#### 1. No Vendor Lock-in
All core components are abstracted behind interfaces:
- `GraphStore`: Can swap Neo4j for AWS Neptune
- `VectorStore`: Supports multiple vector databases
- `LLMProvider`: Works with any LLM (OpenAI, Anthropic, Gemini, Ollama)

#### 2. Production-Ready
- **Async Processing**: Non-blocking I/O for all database operations
- **Background Jobs**: Celery workers handle heavy ingestion tasks
- **Authentication**: JWT-based with RBAC support
- **Error Handling**: Graceful degradation and fallback mechanisms
- **Observability**: Full tracing and metrics collection

#### 3. Intelligent Retrieval
The agentic system:
- Decomposes complex queries into sub-queries
- Dynamically selects retrieval methods (vector vs graph vs cypher)
- Validates outputs against schema (hallucination guard)
- Provides reasoning chains for transparency

## Components Deep Dive

### Core Abstractions (`src/graph_rag_service/core/`)

#### GraphStore Interface
```python
class GraphStore(ABC):
    @abstractmethod
    async def create_node(entity: Entity) -> str
    @abstractmethod
    async def create_relationship(relationship: Relationship) -> str
    @abstractmethod
    async def execute_query(query: str, params: dict) -> List[dict]
    @abstractmethod
    async def find_path(source: str, target: str, max_depth: int) -> List[dict]
```

Implementation: `Neo4jStore` provides unified graph + vector storage using Neo4j 5.x vector capabilities.

#### LLMProvider Interface
```python
class LLMProvider(ABC):
    @abstractmethod
    async def complete(prompt: str, **kwargs) -> str
    @abstractmethod
    async def embed(text: str) -> List[float]
```

Implementation: `UnifiedLLMProvider` wraps OpenAI, Anthropic, Gemini, and Ollama with a consistent interface.

#### Entity Resolution
Multi-stage resolution:
1. **Blocking**: Group by entity type and name similarity (fast reject)
2. **Semantic Check**: Compare embeddings for deep similarity
3. **Threshold Matching**: Configurable thresholds (0.85 default)
4. **Auto-merge**: High confidence merges (>0.95)
5. **Human Review Queue**: Medium confidence flagged for review (0.85-0.95)

### Ingestion Pipeline (`src/graph_rag_service/ingestion/`)

#### Flow
1. **Document Processing**: Extract text from PDF/TXT/MD/DOCX
2. **Chunking**: Split into overlapping chunks (1024 tokens, 200 overlap)
3. **Ontology Generation**: LLM analyzes samples to propose entity/relationship types
4. **Entity Extraction**: Extract entities and relationships per chunk
5. **Entity Resolution**: Deduplicate and merge entities
6. **Embedding Generation**: Create vector embeddings (BGE-M3)
7. **Graph Construction**: Store in Neo4j with hybrid nodes

#### Hybrid Nodes
Each chunk is stored as both:
- A `(:Chunk)` node with text and embedding
- Connected to `(:Entity)` nodes via `[:MENTIONS]` relationships

This preserves source text for grounding while enabling abstract graph queries.

### Retrieval System (`src/graph_rag_service/retrieval/`)

#### Tools
1. **VectorSearchTool**: Semantic similarity using embeddings
2. **GraphTraversalTool**: Relationship exploration and path finding
3. **CypherGenerationTool**: Text-to-Cypher with validation
4. **MetadataFilterTool**: Structured queries on attributes

#### Agent Workflow (LangGraph)
```
[Query] β†’ [Decompose] β†’ [Route] β†’ [Vector/Graph/Cypher] β†’ [Synthesize] β†’ [Response]
                           ↑                                      ↓
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    (Iterative refinement)
```

#### Hallucination Guards
- **Schema Injection**: Prompt includes allowed entity/relationship types
- **Cypher Validation**: Parse and validate against whitelist
- **Self-Correction**: Feed errors back to LLM to fix syntax
- **Fallback**: If graph fails, degrade to vector search

### API Layer (`src/graph_rag_service/api/`)

#### Endpoints
- `POST /api/auth/login`: Get JWT token
- `POST /api/documents/upload`: Upload document (returns task ID)
- `GET /api/documents/status/{task_id}`: Check ingestion progress
- `POST /api/query`: Execute agentic query
- `GET /api/ontology`: Get current ontology schema
- `PUT /api/ontology`: Update ontology (admin only)
- `GET /api/graph/visualization`: Get graph data for visualization
- `GET /api/system/health`: System health check
- `GET /api/system/stats`: System statistics

#### Authentication
- JWT tokens with configurable expiration (default: 30 min)
- RBAC with scopes: `read`, `write`, `admin`
- Dependency injection for protected endpoints

### Workers (`src/graph_rag_service/workers/`)

#### Celery Tasks
- `ingest_document`: Process single document
- `ingest_documents_batch`: Process multiple documents
- `health_check`: Worker health verification

#### Configuration
- Broker: Redis
- Result Backend: Redis
- Serializer: JSON
- Task timeout: 1 hour (configurable)

### Observability (`src/graph_rag_service/observability/`)

#### OpenTelemetry Integration
- **Traces**: Agent reasoning steps, tool calls, database queries
- **Metrics**: 
  - `documents_ingested`: Counter
  - `queries_executed`: Counter
  - `query_duration_seconds`: Histogram
  - `entities_extracted`: Counter

#### Structured Logging
- Log level: INFO (configurable)
- Format: `%(asctime)s - %(name)s - %(levelname)s - %(message)s`
- All async operations logged with context

## Configuration

### Environment Variables
Key settings in `.env`:
- **Neo4j**: `NEO4J_URI`, `NEO4J_USER`, `NEO4J_PASSWORD`
- **Redis**: `REDIS_HOST`, `REDIS_PORT`
- **LLM Provider**: `DEFAULT_LLM_PROVIDER` (openai/anthropic/gemini/ollama)
- **API Keys**: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`
- **Ollama**: `OLLAMA_BASE_URL`, `OLLAMA_MODEL`, `OLLAMA_EMBEDDING_MODEL`
- **Security**: `SECRET_KEY`, `ACCESS_TOKEN_EXPIRE_MINUTES`

### Tuning Parameters
- `CHUNK_SIZE`: 1024 (text chunk size)
- `CHUNK_OVERLAP`: 200 (overlap between chunks)
- `MAX_AGENT_ITERATIONS`: 5 (max reasoning steps)
- `AGENT_TIMEOUT_SECONDS`: 30 (query timeout)
- `ENTITY_RESOLUTION_THRESHOLD`: 0.85 (similarity threshold)
- `DEFAULT_TOP_K`: 5 (retrieval results)
- `GRAPH_MAX_DEPTH`: 3 (graph traversal depth)

## Deployment

### Local Development
```bash
# 1. Ensure Neo4j and Redis are running
# 2. Configure .env with connection details

# 3. Start API server
./start-server.sh  # or start-server.bat on Windows

# 4. Start workers
./start-worker.sh  # or start-worker.bat on Windows
```

### Production Considerations
1. **Database**: Use managed Neo4j (Aura) or self-hosted cluster
2. **Redis**: Use managed Redis (AWS ElastiCache, Redis Cloud)
3. **Worker Scaling**: Add more Celery workers based on ingestion load
4. **API Scaling**: Run multiple API instances behind load balancer
5. **Monitoring**: Integrate with Prometheus/Grafana for metrics
6. **Secrets**: Use secret management (AWS Secrets Manager, HashiCorp Vault)

## Extensibility

### Adding New LLM Provider
1. Implement `LLMProvider` interface
2. Add to `LLMFactory.create()` method
3. Update config with new provider settings

### Adding New Graph Database
1. Implement `GraphStore` interface
2. Update `IngestionPipeline` to use new store
3. Test with existing workflows

### Custom Retrieval Tools
1. Create new tool class with `run()` method
2. Add to `AgentRetrievalSystem.tools`
3. Update routing logic in `_route_query()`

## Testing Strategy

### Unit Tests
- Test each component independently
- Mock external dependencies (Neo4j, Redis, LLMs)
- Focus on business logic

### Integration Tests
- Test component interactions
- Use test database instances
- Verify end-to-end flows

### Performance Tests
- Benchmark ingestion throughput
- Measure query latencies
- Stress test with concurrent requests

## Future Enhancements

### Phase 1 (Current MVP)
- βœ… Core ingestion pipeline
- βœ… Agentic retrieval system
- βœ… Multi-LLM support
- βœ… Entity resolution
- βœ… Async workers

### Phase 2 (Next Steps)
- [ ] React frontend with visual ontology editor
- [ ] Graph visualization (D3.js/Cytoscape)
- [ ] Advanced ontology evolution with migrations
- [ ] Semantic cache with Redis
- [ ] Batch ingestion optimization

### Phase 3 (Advanced Features)
- [ ] Multi-tenant support with data isolation
- [ ] Fine-tuned entity extraction models
- [ ] Graph neural network embeddings
- [ ] Automated ontology quality metrics
- [ ] Export/import ontology schemas

## Troubleshooting

### Common Issues

#### Neo4j Connection Failed
- Verify Neo4j is running and accessible
- Verify credentials in `.env`
- Try connecting with cypher-shell: `cypher-shell -u neo4j -p password`

#### Celery Worker Not Processing
- Check Redis is running: `redis-cli ping`
- Verify broker URL in `.env`
- Check worker logs

#### Ollama Models Not Found
- Pull models: `ollama pull llama3.2 && ollama pull bge-m3`
- Verify Ollama is running: `curl http://localhost:11434/api/tags`

#### Query Returns No Results
- Verify documents are ingested: `GET /api/system/stats`
- Check ontology exists: `GET /api/ontology`
- Try simpler queries first

## Support

For issues or questions:
1. Check documentation and troubleshooting guide
2. Search existing GitHub issues
3. Open new issue with:
   - Clear description
   - Steps to reproduce
   - Environment details
   - Relevant logs

---

**Last Updated**: February 2026
**Version**: 0.1.0